Imputing Missing U-Values for Reliable Building Energy Modelling and Retrofit Planning
Accurate thermal transmittance (U-value) information for building envelopes is fundamental to energy performance modelling, retrofit prioritization, and large-scale decarbonization strategies. However, incomplete datasets particularly missing U-values in existing building stock records—pose a major obstacle to evidence-based decision-making. This study addresses this challenge by systematically evaluating data imputation methods to enhance the reliability and usability of large building energy datasets.
Challenges of Missing Envelope Performance Data
In many national and municipal building databases, U-values are frequently absent due to inconsistent data collection, legacy construction records, or reporting gaps. Such missing values undermine the accuracy of energy simulations and retrofit impact assessments, leading to uncertainty in policy planning and investment decisions. Addressing these gaps requires imputation methods that balance accuracy, robustness, and computational efficiency.
Case Study and Dataset Description
The study employs Energy Performance Certificate (EPC) data from 112,125 domestic properties located in London’s Barnet borough. This large-scale dataset provides a realistic testing ground for comparing imputation techniques under conditions representative of real-world building stock analysis. The focus is placed on envelope U-values critical to predicting heating demand and overheating risk.
Comparison of Imputation Methods
Six imputation methods are evaluated, ranging from traditional techniques—mean and median imputation—to advanced data-driven approaches, including MICE with Bayesian Ridge Regression, k-nearest neighbours, Random Forest, and LightGBM. Performance is assessed using two key indicators: imputation error and execution time. The comparison highlights clear trade-offs between simplicity, computational cost, and predictive accuracy.
Sensitivity Analysis through Energy Simulation
To assess the practical implications of imputation uncertainty, a sensitivity analysis is conducted using EnergyPlus. U-values are perturbed based on each imputation method’s error magnitude, allowing the study to quantify impacts on predicted energy use and overheating risk. This step links statistical performance directly to building performance outcomes, strengthening the relevance of the findings for applied energy modelling.
Key Findings and Implications for Decarbonization
Results indicate that MICE combined with LightGBM delivers the most balanced performance, minimizing uncertainty in energy and overheating predictions while maintaining reasonable execution times. This approach offers a robust pathway for improving incomplete building stock datasets, enabling more reliable retrofit analysis and supporting scalable decarbonization strategies. The findings provide actionable guidance for researchers, policymakers, and practitioners working with large, imperfect building energy datasets.

Comments
Post a Comment