User:Abby.Martin
From REU@MU
Revision as of 22:13, 30 July 2015 by Abby.Martin (Talk | contribs)
Contents
Project
Mentor: Dr. Richard Povinelli
I will be researching the application and accuracy of linear regression model trees. I aim to test the effectiveness of this method in assisting with electric load forecasting. I also plan on comparing this method of forecasting to the current method used by GasDay lab.
Goals & Milestones
- Test the influence of linear regression model trees on the accuracy of electrical use forecasting.
- Determine if linear regression model trees are a better method of forecasting electric load forecasting than other methods.
- Research linear regression model trees and electrical usage.
- Continue research on linear regression model trees, electrical usage, WEKA and MATLAB. Start creating methods for forecasting electrical use using linear regression model trees.
- Continue research on linear regression model trees and electrical usage. Also learn how to effectively use MATLAB and WEKA.
- Test my linear regression model tree with real data and compare its effectiveness with that of other forecasting methods.
- Create a model tree that is as good as or better than the current model being used for electrical forecasting
Weekly Goals
Week One
- Read and research papers that address the following topics:
- Decision Trees
- Machine Learning
- Model Trees
- Linear Regression Model Trees
- Electric Load Forecasting and other methods that have been used
Week Two
- Continue reading about linear regression model trees
- Begin testing various datasets using the WEKA software
- Begin reading some of the source code and documentation to better understand WEKA
- Begin learning, using, and applying MATLAB
Week Three
- Test real data using the WEKA software to create various model trees.
- Read Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark A. Hall to better understand model trees and the WEKA software
- Become comfortable with the MATLAB software
Week Four
- Begin testing converted data using WEKA software
- Create various model trees and compare results
- Prepare mini presentation
Week Five
- Give mini presentation
- Continue testing and comparing model trees
- Create other forecasting models for comparison
Week Six
- Continue comparisons of models
- Begin writing paper and creating poster
Week Seven
- Continue research and comparisons
- Continue work on paper and poster
Week Eight
- Complete poster
- Continue work on paper
Week Nine
- Complete paper
- Prepare for final talk
Week Ten
- Finalize paper
- Poster session
- Formal talk
Weekly Log
Week One
- Orientation activities and forms
- Pre-REU Survey
- Attended GasDay Camp
- Met Dr.Povinelli and decided on research topic
- Read papers on my topic to discover:
- the definition and application of decision trees
- difference between classification trees, regression trees, and model trees
- machine learning and how trees split
Week Two
- Met with Dr. Povinelli to further discuss concepts and goals
- Explained:
- the "greedy" approach
- the various ways to determine the "best" variable and tree
- suggested reading about the M5P Model
- Explained:
- Read about the M5P Model and learned:
- Splits using a Standard Deviation Reduction Method
- Uses a smoothing method for leaves
- Article contained helpful pseudocode for understanding the process of creating a linear regression model tree
- Began working with the WEKA software to create linear regression model trees
- Read some of the source code from WEKA to understand the linear regression model tree creation process
- Began working with and learning MATLAB
Week Three
- Monday, June 15th
- Read Data Mining: Practical Machine Learning Tools and Techniques and gained a much fuller and comprehensive understanding of data mining, model trees, and the WEKA software
- Tuesday, June 16th
- Attend required talk on Good Presentations, Good Technical Writing, and the Difference Between Them
- Continued reading Data Mining: Practical Machine Learning Tools and Techniques
- Organized research and created goals for upcoming weeks
- Wednesday, June 17th
- Tested data using the Weka software to compare linear regression model trees against standard linear regression
- Found a plugin that will give MATLAB the capability of generating linear regression model trees
- Began learning and practicing how to implement MATLAB
- Learned about and implemented the M5PrimeLab plugin for MATLAB
- Thursday, June 18th
- Attend talk on Responsible Conduct of Research
- Did the interactive movie project that was assigned to accompany RCR training
- Successfully implemented the M5PrimeLab plugin to create linear regression model trees
- Friday, June 19th
- Continued with reading of Data Mining: Practical Machine Learning Tools and Techniques
- Used Weka to compare the M5P model against standard linear regression and also to compare the cross-validation techniques against the percentage split technique.
- Downloaded and began learning how to use the typesetting software LaTeX
- Met with Dr. Povinelli to discuss progress
Week Four
- Monday, June 22nd
- Converted all MATLAB data files to arff format
- Began work on mini presentation
- Began testing data using Weka to create model trees
- Tuesday, June 23rd
- Created and tested 10 fold cross-validation tests with a minimum of 10 instances at each leaf for each data set
- Created and tested 66% split with a minimum of 10 instances at each leaf for each data set
- Created and tested an LR model with training set and test set
- Evaluated these various models
- Worked on mini presentation
- Wednesday, June 24th
- Created and evaluated several more models of linear regression model trees
- Generated models for various load factors
- Created presentation for seminar
- Thursday, June 25th
- Presented research thus far at seminar
- Met for lunch and discussed progress and challenges thus far
- Began researching and comprehending various methods of error analysis
- Met with Dr. Povinelli to discuss research and how to give a presentation
- Began developing and evaluating trees with a minimum of 100 instances at each leaf
- Friday, June 26th
- Began developing and evaluating trees with a minimum of 350 instances at each leaf
- Created and evaluated a model tree of only temperature and load
- Created the appropriate graphs to visualize the above tree
- Changed the temperature and load tree to not smoothed and repeated the above graphing process
- Compared the smoothed and not smoothed trees
- Normalized the real data so it can be presented
- Saturday, June 27th
- Worked on mini presentation
Week Five
- Monday, June 29th
- Normalized a data set so results are appropriate for presenting
- Wrote program to export actual and predicted values generated by Weka to an Excel file so appropriate statistics can be calculated
- Wrote program to calculate the mean absolute percent error of the tests
- Tested the data at various minimum numbers of instances and evaluated
- Met with Dr. Povinelli to discuss progress
- Continued work on mini presentation
- Tuesday, June 30th
- Prepared for seminar presentation and delivered presentation
- Finished normalizing data
- Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 350 and 500.
- Wednesday, July 1st
- Calculated MAPE, RMSE, and total number of leaves for data with minimum number of instances of 600, 750, 1000, 3500, and 5000.
- Normalized remaining data sets and separated factors for individual testing.
- Finalized mini presentation
- Thursday, July 2nd
- Mini Presentations
- Informal(though serious) description of what we have been doing to receive feedback in preparation for the formal presentations in Week 10
- Converted date to day of the week and day of the year with sine input
- Mini Presentations
- Friday, July 3rd
- Added data points for weekend versus weekday, hourly, and previous day load data
Week Six
- Monday, July 10th and Tuesday, July 11th
- Read article on feed-forward neural networks and identify additional factors that contributed to reducing error
- Analyzed dates with the highest error and found them to be holidays
- Started creating methods to flag holidays and hopefully reduce error
- Completed the RCR quizzes to finish training
- Wednesday, July 12th
- Analyzed largest errors in prediction to find trends in temperature and date
- Read Philip Brierly's dissertation to get ideas on how to handle these anomalies
- Added difference between previous day and current day as an input
- Thursday, July 9th
- Meet for lunch and discuss progress and challenges thus far
- Met with Dr. Povinelli to discuss progress thus far and how to write a research paper and create a poster
- Implemented new method to handle holidays
- Friday, July 10th
- Worked on solutions to holiday problem and drastic change in temperature problem
Week Seven
- Monday, July 13th
- Added inputs of HDD for 65 and 55 and CDD for 65 and 75
- Continued to address the holiday errors and change in temperature errors
- Met with Dr. Povinelli; first outline approved
- Worked on more detailed paper outline
- Tuesday, July 14th
- Finished detailed paper outline
- Compared new tests with 300 minimum number of instances and 600 minimum number of instances
- Began creating graphs for poster
- Wednesday, July 15th
- Continued creating graphs for poster, paper, and to help visualize possible steps to reduce error
- Thursday, July 16th
- Meet for lunch and discuss progress and challenges thus far
- Met with Dr. Povinelli to discuss detailed outline and model that uses previous hour data
- Began work on poster
- Began work on paper rough draft
- Friday, July 17th
- Worked on paper rough draft
Week Eight
- Tuesday, July 21st
- Performed Student t-tests to discover if the difference between the successive tests is statistically significant
- Continued work on poster
- Wednesday, July 22nd
- Finished first draft of poster
- Met with Dr. Povinelli to get feedback on poster
- Thursday, July 23rd
- Attended GasDay seminar
- Meet for lunch and discuss progress and challenges thus far
- Met with Dr. Povinelli and the rest of the electricity team to discuss work to be done
- Continue work on poster
- Friday, July 24th
- Finished the second draft of the poster
- Continued work on paper
- Saturday, July 25th
- Finished Poster
Week Nine
- Monday, July 27th
- Met with Saber to discuss research and how he can implement it with current forecasting method
- Researched how to integrate Weka in my own Java code and worked on developing my own method to call to create and evaluate a model tree
- Sent Dr. Povinelli finalized poster to be approved
- Tuesday, July 28th
- Met with Saber again to further delve into my code and what code is left to be written
- Continued to research and create a method that integrates Weka
- Wednesday, July 29th
- Electronic version of poster due
- Finished first rough draft of research paper
- Thursday, July 30th
- Attended GasDay Seminar
- Meet for lunch and discuss progress and challenges thus far
- Began work on final presentation
Week Ten
- Tuesday, August 4th
- Poster Session
- Wednesday, August 5th
- First half of the formal presentations
- Thursday, August 6th
- Second half of the formal presentations
- Friday, August 7th
- Post REU Survey and Final Instructions
- Research Papers Due