Source: http://datascience.ibm.com/blog/modeling-energy-usage-in-new-york-city/
On June vi nosotros introduced the IBM Data Science Experience to the the world at the Spark Maker Event that took house inwards Galvanize. We demonstrated the Experience amongst a existent exercise instance developed inwards partnership amongst BlocPower.
BlocPower is a startup based inwards New York City. Its engineering too finance platform develops cook clean unloose energy projects inwards American inner cities. IBM Data Science Experience helped BlocPower perform a comprehensive unloose energy audit of each belongings to create upward one's hear the right mix of high-efficiency engineering to cut down each customer's unloose energy consumption. Tooraj Arvajeh, Chief Engineering Officer at BlocPower, explained how IBM Data Science Experience made this procedure simpler.
"BlocPower functioning is various from outreach too targeting, origination of investment-grade cook clean unloose energy projects to financing projects through our crowdfunding marketplace. Data is the underlying tool of our functioning too IBM's Data Science Experience volition facilitate a closer integration across it too assistance our concern scale upward faster. "
Goals of the demo:
- Easily import information into a notebook from object storage to rapidly start analyzing information too creating predictive models.
- Model unloose energy usage of buildings inwards kWh.
- Identify buildings that swallow unloose energy inefficiently.
- Create a projection too collaborate amongst other information scientists.
- Create an easy-to-use application to brand the upshot of the models consumable past times whatever user.
To produce that, nosotros used tools that information scientists honey today that are integrated into the IBM Data Science Experience: Jupyter notebooks connected to Apache Spark, RStudio, Shiny, too GitHub.
These are the steps that nosotros followed:
1- GitHub + Jupyter notebooks = <3
When starting a novel project, the information scientist tin conduct to start from scratch or to leverage somebody else's work. In this case, nosotros showcase the Import from URL capability to import an existing notebook from GitHub too start working on it right away. There are to a greater extent than than 200k world Jupyter notebooks out at that spot that yous tin use!
2- Load too cook clean data
To analyze information inwards a Jupyter notebook, outset charge the data. Many libraries too commands tin produce that, but it's non ever obvious which i to use. One of the add-ons to Jupyter notebooks is the capability to access information files stored inwards object storage or available through information connections too inwards i click to add together the code needed to charge the information into the notebook.
Once the information is loaded, the adjacent stair is to cook clean it. We created a library called Sparkling.Data, which tin scale to large data, to assistance the information scientist perform this task.
3- Data Exploration
After cleaning the data, nosotros used Matplotlib, the best tool available for information visualization inwards Python, to explore the correlations betwixt unloose energy usage too edifice characteristics such equally age, issue of stories, foursquare footage, amount of plugged equipment, too domestic too heating gas consumption. By analyzing variable relationships, the information scientist can, for example, create upward one's hear the best model to exercise too which variables convey to a greater extent than predictive power.
4- Create a Prediction Model
Our destination is to create a model that predicts the unloose energy consumption inwards kWh of unlike buildings based on characteristics such equally foursquare feet, age, issue of stories, too then on. We model unloose energy usage amongst a linear regression using the algorithm included in scikit-learn, i of the best Python libraries for machine learning. Before running the linear regression, nosotros used the MaxAbsScaler function from scikit-learn to scale the data. To visualize the gibe of this model, nosotros exercise a scatter plot of the observed vs. the predicted values. The resulting R-squared value was some 0.72.
5- Classify buildings past times efficiency
We used the popular K-means algorithm to cluster buildings inwards NYC based on 4 dimensions that dot unloose energy efficiency: gas exercise for heating, gas exercise for domestic purposes, electricity exercise for plugged equipment, too electricity exercise for air conditioning. In the adjacent matplotlib plot, nosotros colored our buildings past times using the K-means labels amongst K=4 too using 2 out of the 4 dimensions. This visualization, too other visualizations non shown here, helped us cut down the 4 clusters to two. These 2 clusters of buildings were interpreted equally the efficient too the inefficient groups of buildings.
6- Flexdashboard too Shiny inwards RStudio
RStudio precisely published on CRAN a novel R parcel called Flexdashboard. This peachy parcel enables creating dashboards really easily, too yous tin include Shiny code to brand dashboards really interactive. H5N1 dashboard tin last shared amongst anyone past times only sending the URL.
The dashboard is divided into 4 sections:
- Data Exploration: H5N1 map of buildings colored past times their electricity consumption. When a edifice is selected, a bar plot indicates how this edifice is doing amongst observe to the average unloose energy efficiency measured inwards 4 dimensions.
- Clustering: H5N1 map of buildings classified equally efficient or inefficient.
- Prediction: Scoring of the linear regression model built inwards the notebook to predict the unloose energy usage inwards kWh too annual terms of electricity for the buildings. On the left side are sliders for selecting the properties of the edifice to score the model.
- Raw Data: We exercise the Data.Tables package to display the information gear upward amongst search too sorting capabilities.
You tin depository fiscal establishment check out the 10-minute demo of IBM Data Science Experience here:
We created a GitHub repository amongst all of the textile too instructions needed to run this demo, too. Enjoy!