User:Abby.Martin

From REU@MU
Revision as of 20:39, 24 June 2015 by Abby.Martin (Talk | contribs)

Jump to: navigation, search

Project

Mentor: Dr. Richard Povinelli

I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to a multitude of other methods that have also been attempted.

Goals & Milestones

  • Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
  • Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
  • Create a linear regression model tree using MATLAB or WEKA for use in the electric portion of GasDay/ apply data and methods found/created to data from the GasDay lab.
  • Research linear regression model trees and electrical usage.
  • Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
  • Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
  • Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.

Weekly Goals

Week One

  • Read and research papers that address the following topics:
    • Decision Trees
    • Machine Learning
    • Model Trees
    • Linear Regression Model Trees
    • Electric Load Forecasting and other methods that have been used

Week Two

  • Continue reading about linear regression model trees
  • Begin testing various datasets using the WEKA software
  • Begin reading some of the source code and documentation to better understand WEKA
  • Begin learning, using, and applying MATLAB

Week Three

  • Test real data using the WEKA software to create various model trees.
  • Read Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark A. Hall to better understand model trees and the WEKA software
  • Become comfortable with the MATLAB software

Week Four

  • Begin testing converted data using WEKA software
  • Create various model trees and compare results
  • Prepare mini presentation

Week Five

  • Give mini presentation
  • Continue testing and comparing model trees
  • Create other forecasting models for comparison

Week Six

  • Continue comparisons of models
  • Begin writing paper and creating poster

Week Seven

  • Continue research and comparisons
  • Continue work on paper and poster

Week Eight

  • Complete poster
  • Continue work on paper

Week Nine

  • Complete paper
  • Prepare for final talk

Week Ten

  • Finalize paper
  • Poster session
  • Formal talk

Weekly Log

Week One

  • Orientation activities and forms
  • Pre-REU Survey
  • Attended GasDay Camp
  • Met Dr.Povinelli and decided on research topic
  • Read papers on my topic to discover:
    • the definition and application of decision trees
    • difference between classification trees, regression trees, and model trees
    • machine learning and how trees split

Week Two

  • Met with Dr. Povinelli to further discuss concepts and goals
    • Explained:
      • the "greedy" approach
      • the various ways to determine the "best" variable and tree
      • suggested reading about the M5P Model
  • Read about the M5P Model and learned:
    • Splits using a Standard Deviation Reduction Method
    • Uses a smoothing method for leaves
    • Article contained helpful pseudocode for understanding the process of creating a linear regression model tree
  • Began working with the WEKA software to create linear regression model trees
  • Read some of the source code from WEKA to understand the linear regression model tree creation process
  • Began working with and learning MATLAB

Week Three

  • Monday, June 15th
    • Read Data Mining: Practical Machine Learning Tools and Techniques and gained a much fuller and comprehensive understanding of data mining, model trees, and the WEKA software
  • Tuesday, June 16th
    • Attend required talk on Good Presentations, Good Technical Writing, and the Difference Between Them
    • Continued reading Data Mining: Practical Machine Learning Tools and Techniques
    • Organized research and created goals for upcoming weeks
  • Wednesday, June 17th
    • Tested data using the Weka software to compare linear regression model trees against standard linear regression
    • Found a plugin that will give MATLAB the capability of generating linear regression model trees
    • Began learning and practicing how to implement MATLAB
    • Learned about and implemented the M5PrimeLab plugin for MATLAB
  • Thursday, June 18th
    • Attend talk on Responsible Conduct of Research
    • Did the interactive movie project that was assigned to accompany RCR training
    • Successfully implemented the M5PrimeLab plugin to create linear regression model trees
  • Friday, June 19th
    • Continued with reading of Data Mining: Practical Machine Learning Tools and Techniques
    • Used Weka to compare the M5P model against standard linear regression and also to compare the cross-validation techniques against the percentage split technique.
    • Downloaded and began learning how to use the typesetting software LaTeX
    • Met with Dr. Povinelli to discuss progress

Week Four

  • Monday, June 22nd
    • Converted all MATLAB data files to arff format
    • Began work on mini presentation
    • Began testing data using Weka to create model trees
  • Tuesday, June 23rd
    • Created and tested 10 fold cross-validation tests with a minimum of 10 instances at each leaf for each data set
    • Created and tested 66% split with a minimum of 10 instances at each leaf for each data set
    • Created and tested an LR model with training set and test set
    • Evaluated these various models
    • Worked on mini presentation
  • Wednesday, June 24th
    • Created and evaluated several more models of linear regression model trees
    • Generated models for various load factors
    • Created presentation for seminar
  • Thursday, June 25th
    • Presented research thus far at seminar
    • Meet for lunch and discuss progress and challenges thus far

Week Five

  • Thursday, July 2nd
    • Mini Presentations
      • Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10

Week Six

  • Thursday, July 9th
    • Meet for lunch and discuss progress and challenges thus far

Week Seven

  • Thursday, July 16th
    • Meet for lunch and discuss progress and challenges thus far

Week Eight

  • Thursday, July 23rd
    • Meet for lunch and discuss progress and challenges thus far

Week Nine

  • Wednesday, July 29th
    • Electronic version of poster due
  • Thursday, July 30th
    • Meet for lunch and discuss progress and challenges thus far

Week Ten

  • Tuesday, August 4th
    • Poster Session
  • Wednesday, August 5th
    • First half of the formal presentations
  • Thursday, August 6th
    • Second half of the formal presentations
  • Friday, August 7th
    • Post REU Survey and Final Instructions
    • Research Papers Due