⇠ Back to Keira's Portfolio List

Techinical Report for Energy Load Forecasting Competition

December 09, 2021

Estimated reading time: 10 mins

Executive Summary

In this project, we are going to forecast the hourly load using time series based on the historical load from 1/1/2008 – 12/31/2012. Not only I used various models from class including Decomposition – Arima, Decomposition – Ets, Neural network (nnetar), smoothing model Ets, Naïve model, seasonal naïve model, Auto Arima and linear regression model, but also tried methods in the “Cases with Codes for SCE project” which includes random forest, GBM, AutoML and XGBoost (1)(3). The best model is XGBoost which has a MAPE of 2.96% on testing data (year 2012).

Introduction

Data cleaning

data-clearning

EDA

eda1 eda2

Despite ggplot we learned in class, I also tried the way learned in the Kaggle competition mentioned at the beginning. The graphs are more compact. Results are shown below

edashow1 edashow2

Based on the graph, I observe strong daily, weekly and yearly seasonality.

Feature engineering

In this step, I did some feature engineering for linear regression and machine learning model.

featureeng

Methodology

After dealing with the data, next step is to choose a champion model based on their MAPE. More specifically, I split the data into training and testing data. Since we don’t have the data for year 2012. I will train on the years before 2011 and test on year 2011. Then I will compare their MAPEs on the testing data.

Create time series object

Since the data has daily, weekly and yearly seasonality, I create multi-seasonal time series:

creattime

Split train and test

Training: 1/1/08 - 12/31/2010

Testing: 1/1/2011- 12/31/2011

Using the window function:

window1 window2

Split the data for regression and machine learning model:

splitdata

Model Compare

Model used and graph:
Mstl – Arima:

msta msta2

Mstl – ets:

Both mstl-arima and mstl-ets models have pretty result as they are able to catches multiple seasonalities in our data. The best MAPE is 8.92 on testing set (year 2011).

mstlets1 mstlets2

Nnetar:

nnetr1

nnetr2

Ets model:

Both nnetar and ets model fit well on the training data as they achieve a MAPE below 2. However, they gave a nearly straight line for testing data.

ets1 ets2

Naïve and Seasonal naïve model:

naive1 naive2

Auto Arima:

For auto arima model, I first build without changing the default p,d,q,P,D,Q value. As the graph shown below, it gave a straight line. Therefore, I used a loop to change p,d,q,P,D,Q value.

auto1 auto2

The first loop is to decide the combination of the non-seasonal part, while second loop is to decide the numbers for the seasonal part.

loop1 loop2

Linear regression model:

By adding daily, weekly and yearly lags to the model, the performance improved and receive a MAPE of 6.4938 on testing data.

Inspired by Tao Hong’s paper mentioned in Q&A (2), I added features like HourT^2, HourT^3, MonthT^2, MonthT^3, Day*Hour. These features help to boost my MAPE to 5.7633.

linear1 linear2

Machine learning models:

Following methods are learned from the article from the Kaggle competition (1). These machine learning models have a lower MAPEs.

Random Forest:

forest1

GBM:

gbm

AutoML:

This model can run through several models and record the summary on its “leaderboard”. By calling the “leader” from the “leaderboard”, we can get the best model which is XGBoost. automl1

As in the graph below, AutoML shows that the best model is xgboost with 536 trees. Its MAPE is 3.3 and RMSE is 146.11.

automl2 automl3

Xgboost:

In the xgboost model, I added the useful features created previously for regression. The lowest MAPE it got is 3.4 on test set.

xgboost

NN in Python:

I also tried NN in python. The codes are learned in a tutorial(5).

pnn

Xgboost in Python:

pxgb

Model Selection

model

Based on the chart above and the analysis, the champion model is Xgboost.

Results/analysis

I used the champion model which is Xgboost to predict the hourly demand in 2012. In this step, I split the data before 2012 to the training data and after 2012 to be test data.

In order to improve the performance of my Xgboost model, I tried different features. Inspired by Tao Hong’s paper, I record the features using in each xgboost model and their testing MAPE on 2012 data.

res1 res2

I also tried Xgboost in python learned from the link in Q&A.(3)

res3 res4

Based on the features tried in R, I got a MAPE of 4.07 in python by using Xgboost. In order to further improve my model. I did more research on how to predict energy. After read some articles, I thought adding “holidays” might be useful. Also, temperature can be a factor effecting the power using. Therefore, I added lags, mean, std, max and min on the temperature. The coding part is learned in a Kaggle artivle.(4) Finally, I receive a MAPE of 2.96%.

res5

 


Reference

(1) https://www.kaggle.com/goldens/hourly-energy-consumption-time-series-analysis

(2) https://repository.lib.ncsu.edu/bitstream/handle/1840.16/6457/etd.pdf?sequence=2&isAllowed=y

(3) https://www.kaggle.com/robikscube/tutorial-time-series-forecasting-with-xgboost/notebook

(4) https://www.kaggle.com/sayedathar11/time-series-forecasting-xgboost-lags-and-rolling

(5) https://www.tensorflow.org/tutorials/keras/regression


Aspiring Data Scientist | USC MSBA 2022' | Actively looking for Data Scientist/Data Analyst (DS/DA) full-time positions


© 2022 Haochen (Keira) Zhu