Predictive Analytics for Buildings
Engel Ching | Architecture Analytics

About Architek

Predictive Analytics for Energy Efficiency in Buildings. Using machine learning, building occupancy and weather data, the prediction model provides near-future load demand information to the building management system (BMS), allowing the building to better manage resources by closely matching output to anticipated changes in demand, which reduces energy consumption, reduces operating costs and provides healthier indoor environments.


Buildings consume nearly half of all the energy produced and responsible for nearly half of carbon emissions in the United States, significantly more than all the vehicles on the road. The mechanical and electrical systems of a building approaches 70% of a buildings energy usage. Currently, these systems are reactive, instead of predictive.

There are two main factors that determine heating and cooling loads: the temperature outside and the number of occupants inside. A few hours before we all arrive for work in the morning, a building typically starts up and pre-heats or pre-cools our workspace. It does this without knowing how many people are coming to work and what temperature it will be outside.

What if buildings knew this information ahead of time? Based on consultations with energy experts, we can save energy if the building was more proactive with advanced predicted data.

Goals + Solutions

Reduce building energy consumption, reduce operating costs and provide healthier indoor environments with increased use of natural ventilation. Understand time series correlations between weather and building occupancy data to predict future short term demand. Through this project, I was able to show a proof of concept that significant cost and energy can be saved if the building automation system is able to anticipate demand.

Technical Summary

For this high-level approach, BART data was used to infer the percentage of workers who are actually going to work in downtown San Francisco. I obtained five years of BART hourly exit data by writing a formal California Public Records Act request. Weather data was also used to determine if it affects work attendance.

Time Series Analysis

Morning commute data to downtown San Francisco was inferred by filtering BART exit counts between 5-11am at Embarcadero, Montgomery, Powell and Civic Center stations. Below is a graph showing how many people go to work by season. Less people work during winter months, and on a weekly basis, Mondays and Fridays tend to be slower.

The chart below shows that we have nice weather for about half the year. We should be able to use more natural air during work hours and this would result in a healthier environment while saving energy.

The chart below shows a cross correlation analysis between temperature and BART series data. The top chart shows that warm weather correlates to an increase in BART ridership. The bottom chart shows that an increase in pressure indicates a drop in BART ridership in the coming weeks. Turns out that atmospheric pressure is a good indicator of future weather.

The plot below shows the standardized time series data from 2011-2015. The black line shows BART exit counts. The sharp dips represent Thanksgiving and Christmas holidays and the BART strikes in 2013. The seasonal upward and downward trends appear to follow the temperature shown in orange. I also included the flu trend shown in green and it appears that a rise in flu cases indicate a drop in BART ridership.

BART data was normalized using a two-step process: trimmed low outliers at the first percentile and high outliers at the 99th percentile. To remove the upward trend, I normalized the data using a moving maximum shown in the red line.

Random Forest Machine Learning

These features were used in a random forest machine learning model. Overall, weather was not a big factor in predicting work attendance in San Francisco because of the mild weather. The model was tested on Chicago CTA data for comparison, and the weather was more of a factor because of its extreme temperatures. However, the prediction accuracies for both cities are both over 94%, suggesting that our seasonal schedules are highly predictable.

Data Pipeline

I performed time series analysis on five years of weather and BART data which helped me create features for the random forest model. Weather forecasts are requested via API daily, and then processed and stored in a Postgres SQL database. The prediction model then forecasts an estimate of the percentage of worker attendance in San Francisco for the next few days.

Cost, Energy and Emissions Reduction Calculations

In conclusion and to get some insight, back of the envelope calculations were performed to determine the potential impact if we used predictive analytics in our buildings. I consulted energy experts and established a 5% energy reduction as a conservative estimate based on existing use cases.

In total there are more than 1700 commercial buildings in San Francisco. What if all these buildings used a predictive system? It turns out that the results are quite staggering with potentially more than 1 billion dollars saved, as well as significant savings in CO2 emissions and energy.

Future work

  • Acquire building level user data
  • Include more seasonal data (school breaks, etc)
  • Work with building control systems companies
  • Validate calculations (epa target finder)
  • Implement!

This project demonstrates the potential benefits of predictive analytics in our buildings. The prediction model is a highly scaleable solution and we can only begin to imagine the potential benefits globally. Future work towards implementation has been identified, which includes acquiring more building specific data and validating cost and savings calculations. Seasonal and individual work patterns appear to be significant factors in predicting our work schedules and should be incorporated into the model. Hourly occupancy data can be incorporated to allow more fine-grained predictions which could lead to more cost and energy savings.

Special Thanks

Dimitri Contoyannis (Director, Sustainability Services, Noresco)
Kristen Salinas (Senior Sustainability Consultant, Noresco)
Pam Herhold (Bay Area Rapid Transit)
Ze'ev Gilovitz ( weather API python wrapper)


scripting language that you can fly with
exploratory data analysis, data wrangling
Sci-kit Learn
machine learning
cross correlation time series analysis
SQL database
SQL database adapter for python
web app framework
host web app, cloud data storage

2011-2015 hourly/daily weather + forecast
2011-2015 hourly exit counts
Chicago CTA
2011-2015 daily ridership
NOAA Weather
2011-2015 daily weather
used to cross check commute hours and stations


Visit the project repo on Github

Architek on Github


Feel free to email me with any comments or to just say hello!