Forensic Modeling to Bridge Dataset Gaps – Making $ense of Energy
Modeling tools are used for more than forecasting. To complete the analysis of household energy consumption, forensic data modeling was required. Data from two different datasets were needed, and was in slightly different formats that prohibited a smooth alignment.
- The electric cooperative utility provided hourly power in kilowatt-hours (kWh), 24-hours a day for 14 months. As illustrated in, Mapping 10,000 Points of Hourly Power Use – Making $ense of Energy, this data did not include solar power consumption. The utility also provided power they purchased from us, but that was per month, and it is a subset of the solar power produced.
- The solar panel data which provided power output in kilowatts (kW) every 5 minutes, but really only during daylight hours when the sun was shining. This is 100 percent supply without consumption data.
- There has been a subtle shift in this series from solar capacity in kW to power production in kWh. The capacity per unit of time (kWh) is the same in a one-hour time span. But over the course of a day, month, or year, power in kWh looks much different than the initial kW (without time).
The two data sets match well at the monthly data level, but this is not sufficiently robust to answer household consumption questions. Plus, most of the data is available on an hourly basis for all of 2024. It doesn’t seem right to lose all that information by aggregating from nearly 700 daily hour data points each month, up to one aggregated monthly data point.
To model at the hourly level, the information missing is direct consumption of solar power. The panels provide total production at the hourly level. The utility provides total consumption of non-solar power at the hourly level, and a monthly value of solar power they purchased.
There were no instructions on how to get this done. It was accomplished by analyzing data patterns and bridging gaps between datasets. The missing solar data was about what the household consumed directly from the panels each day during daylight hours. Essentially, the solar data missing from the 10,000-points chart needed to be added back in.
Today’s chart does that. The forensically modeled missing data was recovered for hours between 8 a.m. and 6 p.m. This chart shows daily data for June 2024, the same month used in the other posts (but now in kWh rather than kW). The data for the non-daylight hours from the utility was essentially complete. The three lines in the chart have no mathematical meaning. They just provide a visual reference for the time from midnight to 8 a.m., 8 a.m. to 6 p.m., and 6 p.m. to midnight.
Modeling tends to reduce the random variability, as can be seen in the resulting data for each day in June 2024. But there is some variability retained. The new data was only for solar power consumption, but in June 2024 that may be nearly half of the power used in daylight hours. This dataset does not yet include weather data. Power use is heavily influenced by electric heat and air conditioning systems. That adaptation is underway, but not included in this current work.
Without going into the PhD-level analytics, the modeling process included the following:
- Set up data tables for solar output and utility power consumption by hour and day.
- Identified power consumed from the utility between 8 a.m. and 6 p.m. and segregated utility power consumed between 6 p.m. and 8 a.m. This latter was averaged each day.
- Used monthly solar power purchased by the utility to establish monthly direct use (total output – solar sales to utility).
- Standardized solar production. Established net (solar – utility) daily average production without days of low production (cloudy day) included.
- Monthly solar power production directly consumed, formed an upper limit of solar power consumption. This was mathematically weighted by daily production across each month.
- Once known direct consumption was distributed daily based on relative weighting, hourly solar consumption was added to utility-supplied power consumption, producing the chart for June 2024.
Easy peasy.
That was a number that was missing and successfully bridged the two datasets in a meaningful way.
The most beneficial aspect of this exercise is wallowing in the data. As mentioned in Comparative Monthly Solar Output – Making $ense of Energy, the solar array is operating at design capacity. This home solar installation only operates about 20 percent of a 24-hour day at 70 percent of design capacity (due to cloud cover). Understanding the nuances of this technology for me is best accomplished by watching how the data flows hour by hour, day by day, month by month, and really season by season.
This model provides hourly data for utility-suppled power consumption, solar panel-suppled power consumption and matches the monthly data provided by the utility on solar power purchased. To be clear, the model is based on the sales value reported by the utility. If that value was off by 10-20 percent, the model would simply adjust to the reported value. It does not independently validate the quantity sold to the utility.
It is important to understand that 1) data is powerful, 2) blending data from different sources should not be limited simply because the data is in different formats, and 3) be vigilant. There may not be time to build models such as this realistically. But the cost of time and knowledge employed in these models was a far cheaper alternative to installing new sensors and technology to collect primary data.
Comments
Forensic Modeling to Bridge Dataset Gaps – Making $ense of Energy — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>