EPC-Based Clustering Framework for Representative Building Selection and Scalable Energy Simulation


Identifying representative buildings is essential for scalable urban energy simulation and effective retrofit planning. However, continuous monitoring data are often unavailable, limiting data-driven classification approaches. This study presents a structured methodology for selecting representative buildings through clustering of Energy Performance Certificate (EPC) attributes. By grouping buildings based on certificate-based features and validating clusters against thermal performance indicators, the research establishes a generalizable framework for simulation targeting and policy support without reliance on long-term metering.

Six-Phase Clustering Workflow and Internal Validation

The proposed methodology follows a six-phase workflow, beginning with EPC attribute preparation and feature engineering. Three clustering techniques—K-Medoids, Agglomerative clustering, and Gaussian Mixture Model (GMM)—were implemented to group buildings based on similarity. Cluster quality was assessed using internal validation metrics, including Silhouette score, Calinski-Harabasz index, and Davies-Bouldin index. Across these indices, Agglomerative clustering frequently demonstrated superior internal performance, indicating well-defined and compact group structures.

Case Study Application in Helsinki

An EPC database of educational buildings in Helsinki was used to demonstrate the applicability of the framework. The dataset enabled large-scale clustering and subsequent evaluation of thermal representativeness. Linkage distance analysis and performance consistency led to the identification of four meaningful building clusters. This case study confirms the feasibility of EPC-based grouping for urban-scale simulation strategies in cold-climate contexts.

Thermal Validation Through Regression Modeling

To evaluate thermal validity, cluster-specific regression models were developed to predict District Heating (DH) demand from outdoor temperature. Linear Regression, Random Forest, and XGBoost models were trained within each cluster and compared with global models trained across the entire dataset. Cluster-based models consistently achieved higher predictive accuracy, demonstrating that EPC-derived clusters effectively capture differences in thermal behaviour and heating response characteristics.

External Cluster Alignment and Statistical Differentiation

To assess external consistency, DH-based clustering results were compared with EPC-derived clusters using Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). GMM exhibited the strongest external alignment, achieving ARI = 0.555 and NMI = 0.571. Statistical differentiation tests, including ANOVA and Kruskal–Wallis with False Discovery Rate correction, confirmed significant differences in EPC attributes across clusters. Furthermore, medoid buildings were shown to represent cluster means effectively, with correlation coefficients ranging from 0.92 to 0.98, supporting their use as representative simulation models.

Feature Importance and Policy Implications

Feature importance analysis identified air leakage rate as the dominant predictor of heating demand, followed by UA value and overall energy performance (EP) value. These findings highlight critical building characteristics influencing thermal performance and cluster formation. The proposed methodology enables meaningful building classification and representative selection without detailed metering, providing a scalable approach for energy simulation, retrofit prioritization, and data-informed energy policy development.


Comments

Popular posts from this blog

🌟 Best Architectural Design Award – Nominations Now Open! 🌟

🚆🤖 Deep Learning Model Wins for Train Ride Quality! 🎉🧠

👁️🌿 How Eye Tracking is Revolutionizing Landscape Design Education! 🎓✨