Energy forecasts with hetida designer

Using hetida designer, the interactive Python workflow editor of the hetida platform, we are developing a model for forecasting the electricity consumption of an industrial company. In addition to the practical question, the main focus is on the possible applications and operation of hetida designer.

How can electricity consumption be forecast?

Not only industrial companies, but also municipal utilities buy their electricity on the financial markets, sometimes years in advance. The more accurately consumption can be forecast for a specific point in time, the better. Less short-term and expensive additional purchases are necessary. Conversely, too much electricity that has already been purchased does not have to be resold at poorer conditions. Intuitive factors or questions besides the time that (can) influence power consumption are:
  • Which day of the week is being considered? Does the day fall on a weekend?
  • Are there school vacations on that day?
  • Is it a public holiday?
We look at the load profile of a manufacturing company in the Swiss canton of Aargau in 2016, which includes quarter-hourly electricity consumption values. Based on this data, we answer the following question:

What is the company’s electricity consumption on May 25, 2017 between 10 a.m. and 11 a.m.?

We work with the hetida designer to predict storm consumption. We start our analysis by visualizing the available data. In the next step, data preparation, we expand the data set to include the above-mentioned temporal influencing factors. We then train a model that reconstructs the load profile from 2016 as accurately as possible. To do this, we consider linear regression and a random forest algorithm, a basic machine learning tool. Finally, we use the random forest algorithm, trained on the 2016 data, to predict May 25, 2017.

Data visualization

Plot of the load profile from 2016
The typical five-day week for an industrial company can be identified. Occasionally, for example in May, July or at the end of December, different structures can be identified. We first want to explore and explain these and then take them into account in the forecast for 2017.
Workflow for visualizing the load profile
First, the data is read from the database as a time series (see following figure) and converted into a data frame. At the same time, the power consumption values are given the name ‘Values’. The second component plots the data.
Importing data from the database.

Data preparation

We assign the corresponding month, day of the week and hour of the day to each consumption value. For a more precise analysis of the underlying time data, we choose a representation using sine and cosine. This makes it possible to specifically consider cyclical dependencies between the respective time specifications. We also decide for each value whether it falls on a weekend, a public holiday or a school vacation day. The data set generated in this way forms the basis for developing a model to predict electricity consumption for May 25, 2017.
Data preparation with the hetida designer
The data frame generated above is first extended to include the cyclical representation of the time information (Circular Representation of Time Components). Then the information is added as to whether a consumption value falls on a weekend (Weekend). The three other components decide whether a value falls on a school vacation day, a public holiday, or both a weekend and a public holiday. These three components can be used for any geographical region, with specific inputs in each case. For our company from Switzerland, for example, we select “country = CH, province = AG, state = None, year = 2016”.

Linear regression

In addition to the regression of the consumption values, the underlying company is primarily interested in how well the model depicts the load profile. We analyze the quality of the model using the R² value. We also output the coefficients of the influencing variables to get an impression of which of these factors have a particularly strong influence on electricity consumption.
Workflow of the entire linear regression.
The processed data is first prepared for the regression (‘train, test, split’). For this purpose, the data is split into training and test data. We pass the ‘Values’ column as the target variable (label) of the regression. We select 20 percent of the data (test_size = 0.2) as test data and train the model on the remaining 80 percent. This training takes place in the next step (‘Linear Regression – Trained Model’). In the upper strand of the workflow, predicted values are then generated on the trained model (‘Predit Sklearn Trained Model’). The regression can occasionally predict negative values. As this makes no sense in terms of content, we set these values to zero (‘Negative to Zero’). Finally, the predicted values are visualized together with the test data. The lower two components generate the coefficients (‘Linear Regression – Coefficients’) and the R² value (‘Linear Regression – Goodness of Fit’) of the linear regression.
The results log after executing the above workflow
As expected, the coefficients for weekends, school vacations and public holidays are strongly negative. On such days, the company consumes significantly less electricity. This information is particularly important for the subsequent forecast. At 0.82, the R² value is already in the very good range. The question is: Should we now use the linear regression to predict May 25, 2017? Or can we improve the model or the R² value even further?

The question is: Should we now use the linear regression to predict May 25, 2017? Or can we improve the model or the R² value even further?

Random Forest

In a second step, we pass the processed data set from 2016 to a random forest algorithm. Once again, we analyze the quality of the model using the R² value. We also output the percentage influence of the influencing variables in descending order.
Workflow of random forest
The components have the same functionalities as in linear regression. However, whereas previously the coefficients of the influencing variables themselves were output, the random forest provides specific information on how strong their influence is on the target variable, i.e. electricity consumption.
The results log after executing the above workflow
The day of the week has by far the greatest influence on electricity consumption. However, weekends, school vacations and public holidays also have significant influences that need to be taken into account when making a prediction. The R² value is now 0.96 and could therefore be significantly improved by switching from linear regression to the random forest. The random forest is therefore a suitable basis for looking into the future.

Result: Forecast for May 25, 2017

If we zoom in on the week around 25 May in the visualization of the load profile from 2016, we get a familiar view. A five-day week with a constant structure, framed by weekends with lower consumption.
The results log after executing the above workflow
For the prediction based on the random forest algorithm, we first generate a data set for 2017, in which dependencies on the day of the week, time, public holiday and school vacation are integrated. Now we use the Random Forest Algorithm, trained on the 2016 data, to predict the 2017 data.
Workflow of the forecast for 2017
As above, the random forest algorithm is trained on the data for 2016. In parallel, a data set for 2017 is generated in the “Time Data” component, for our company using the input “country = CH, province = AG, state = None, year = 2017”. This is then passed to the random forest for prediction.

Can we simply adopt the values from 2016 for 2017? A look at the plot of the predicted consumption values shows a different picture! On Thursday, May 25, the predicted consumption drops abruptly. The following Friday shows slightly higher consumption again, but still well below that of a “normal” Friday. What is the reason for this?

A look at the calendar reveals that May 25, 2017 is a public holiday, Ascension Day. The company’s machines were at a standstill on this day. Work resumes on the following Friday. However, it can be assumed that many employees take vacation on this bridge day and the company is not running at full capacity, which leads to significantly lower forecast values! The Random Forest algorithm recognized this situation and automatically adjusted its energy forecast accordingly.
Request a free demo now!

Experience the IoT and analytics platform live and make an appointment.

Our experts will get in touch with you shortly.

We look forward to hearing from you.

By submitting the form, we process your data in accordance with our privacy policy.


...don't miss anything - subscribe to our newsletter!

By submitting the form, we process your data in accordance with our privacy policy.


Jetzt kostenlose Demo anfordern!

Erleben Sie die IoT- und Analytics-Plattform live und vereinbaren Sie einen Termin. 

Unsere Experten werden sich kurzfristig mit Ihnen in Verbindung setzen.

Wir freuen uns auf Ihren Kontakt.

Mit Absenden des Formulars verarbeiten wir Ihre Daten gemäß unseren Datenschutzbestimmungen.


Request a free demo now!

Experience the IoT and analytics platform live and make an appointment.

Our experts will get in touch with you shortly.

We look forward to hearing from you.

By submitting the form, we process your data in accordance with our privacy policy