Source: http://lenkiefer.com/2017/04/26/housing-data-uncertainty/
VISUALIZING UNCERTAINTY IN HOUSING DATA
PUBLISHED BY LEN KIEFER
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are commonly based on pocket-size samples that are dependent champaign to sampling variability. The diverse authorities statistical agencies commonly study estimates of doubtfulness amongst their releases. For example, both the New Residential Construction and New Residential Sales reports include estimates of sampling doubtfulness along amongst their indicate estimates.
In this postal service I desire to explore ways to visualize sampling doubtfulness with http://lenkiefer.com/2017/04/26/housing-data-uncertainty/
VISUALIZING UNCERTAINTY IN HOUSING DATA
PUBLISHED BY LEN KIEFER
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are commonly based on pocket-size samples that are dependent champaign to sampling variability. The diverse authorities statistical agencies commonly study estimates of doubtfulness amongst their releases. For example, both the article from a the New York Times Upshot spider web log a few years ago.
Data
For data, let’s become ahead too utilization New Home Sales estimates from the U.S. Census Bureau too U.S. Department of Housing too Urban Development. The Census provides a overnice .csv file y'all can download here. The spreadsheet includes estimates of sampling uncertainty.
If y'all become to this link you tin larn a nil file that contains the information we’ll use. If y'all opened upward the .csv file inwards Excel, y'all volition honor the information genuinely begins on row 705 (as of Apr 26, 2017, it volition movement over time). Let’s maintain you’ve unzipped the .csv file too saved it somewhere as RESSALES-mf.csv.
Note that this file is set out much the same equally the housing starts information nosotros used http://lenkiefer.com/2017/04/26/housing-data-uncertainty/
VISUALIZING UNCERTAINTY IN HOUSING DATA
PUBLISHED BY LEN KIEFER
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are commonly based on pocket-size samples that are dependent champaign to sampling variability. The diverse authorities statistical agencies commonly study estimates of doubtfulness amongst their releases. For example, both the ############################################################################### htmlTable::htmlTable(rbind(tail(df.sales %>% map_if(is_numeric,round,0) %>% data.frame() %>% as.tbl())))
## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated
per_idx | cat_idx | dt_idx | et_idx | geo_idx | is_adj | val | date | |
---|---|---|---|---|---|---|---|---|
1 | 651 | 3 | 1 | 0 | 3 | 0 | 34 | 2017-03-01 |
2 | 651 | 3 | 0 | 1 | 3 | 0 | 10 | 2017-03-01 |
3 | 651 | 3 | 1 | 0 | 4 | 0 | 144 | 2017-03-01 |
4 | 651 | 3 | 0 | 1 | 4 | 0 | 6 | 2017-03-01 |
5 | 651 | 3 | 1 | 0 | 5 | 0 | 62 | 2017-03-01 |
6 | 651 | 3 | 0 | 1 | 5 | 0 | 7 | 2017-03-01 |
Let’s organize the information a petty chip more.
################################################################################## # Filter to merely the us, full sales at an annual charge per unit of measurement new.sales<-filter(df.sales, cat_idx==2 & (dt_idx==1 | et_idx==1) & geo_idx ==1 ) ################################################################################## ################################################################################## # Rearrange the information new.sales<-new.sales %>% filter(year(date)>1999) %>% select(date,val,et_idx) %>% spread(et_idx,val) # Rename columns colnames(new.sales)<-c("date","sales","e.sales") ################################################################################## # Check it out: htmlTable::htmlTable(rbind(tail(new.sales %>% map_if(is_numeric,round,0) %>% data.frame() %>% as.tbl())))
## Warning: Deprecated ## Warning: Deprecated ## Warning: Deprecated
date | sales | e.sales | |
---|---|---|---|
1 | 2016-10-01 | 568 | 8 |
2 | 2016-11-01 | 573 | 8 |
3 | 2016-12-01 | 551 | 7 |
4 | 2017-01-01 | 585 | 8 |
5 | 2017-02-01 | 587 | 8 |
6 | 2017-03-01 | 621 | 8 |
VIZ 1: Ribbon Chart
First, let’s remake a viz we’ve http://lenkiefer.com/2017/04/26/housing-data-uncertainty/
Sumber http://engdashboard.blogspot.com/
VISUALIZING UNCERTAINTY IN HOUSING DATA
PUBLISHED BY LEN KIEFER
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are commonly based on pocket-size samples that are dependent champaign to sampling variability. The diverse authorities statistical agencies commonly study estimates of doubtfulness amongst their releases. For example, both the
Viz 2: Gif
Instead of using a ribbon, let’s describe random samples too animate them to highlight uncertainty.
################################################################################## # Function for sampling myf<- function(sales,e.sales){ rnorm(250,sales,e.sales/100*sales) } ################################################################################## ################################################################################## # describe samples using map2, too hence unnest to blow upward information too grouping output.data<-new.sales %>% mutate(sales.samp =map2(sales,e.sales,myf)) %>% # describe our samples unnest(sales.samp) %>% # unpack the samples group_by(date) %>% mutate(id=row_number()) %>% ungroup() # this gives us an id for each sample ##################################################################################
Now nosotros tin animate it:
################################################################################## # Animate plot! ################################################################################## oopt = ani.options(interval = 0.25) saveGIF({for (i in 1:100) { g<- ggplot(data=filter(output.data,year(date)>2015 & id<=i),aes(x=date,y=sales.samp,group=id))+ geom_line(color="gray50",aes(alpha=ifelse(id==i,1,0.2)))+ #geom_line(data=filter(output.data,id==i),color="red",alpha=1,size=1.05)+ guides(alpha=F)+ geom_point(size=3,color="black",aes(y=sales))+ theme_minimal()+ labs(x="",y="", title="New domicile sales (1000s, SAAR)", subtitle="Black dots estimates,each greyish line of piece of occupation a random sample from normal amongst survey criterion error", caption="@lenkiefer Source: U.S. Census Bureau too U.S. Department of Housing too Urban Development")+ coord_cartesian(xlim=as.Date(c("2016-01-01","2017-03-01")),ylim=c(400,700))+ theme(plot.caption=element_text(hjust=0)) print(g) ani.pause() print(paste(i,"out of 100")) } },movie.name="newsales_04_26_2017 samp ex.gif",ani.width = 600, ani.height = 450)
Viz 3: Beeswarm
We tin besides brand a beeswarm plot (for more http://lenkiefer.com/2017/04/26/housing-data-uncertainty/
For more: http://lenkiefer.com/
VISUALIZING UNCERTAINTY IN HOUSING DATA
PUBLISHED BY LEN KIEFER
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are commonly based on pocket-size samples that are dependent champaign to sampling variability. The diverse authorities statistical agencies commonly study estimates of doubtfulness amongst their releases. For example, both the
And nosotros could animate it:
################################################################################## # Animate plot! ################################################################################## oopt = ani.options(interval = 0.2) saveGIF({for (i in 1:200) { g<- ggplot(data=filter(output.data,date>="2016-03-01" & id<=i), aes(x=date,y=sales.samp,color=sales.samp, alpha=ifelse(id==i,1,0.2) ))+ scale_color_viridis(name="")+ guides(color=F)+ geom_quasirandom()+theme_minimal()+ geom_point(data=filter(output.data,date>="2016-03-01" & id==1), aes(y=sales),color="black",size=3,alpha=1) + scale_x_date(date_labels="%b-%Y",date_breaks="2 months", limits=as.Date(c("2016-02-15","2017-04-15")))+ scale_y_continuous(limits=c(400,800))+ guides(alpha=F)+ labs(x="",y="", title="New Home Sales (1000s SAAR)", subtitle="Estimates (black dots) too sampling uncertainty", caption="@lenkiefer Source: U.S. Census Bureau too U.S. Department of Housing too Urban Development\ncolored dots stand upward for draws from a normal distribution centered at gauge amongst criterion mistake of estimate.")+ theme(plot.caption=element_text(hjust=0)) print(g) ani.pause() print(paste(i,"out of 250")) #counter } },movie.name="new domicile sales swarm.gif",ani.width = 600, ani.height = 450)
Conclusion
Visualizing doubtfulness tin endure challenging. Depending on the audience, doubtfulness tin endure a hard concept. I’m non sure as shooting the information visualization champaign has a consensus on the correct agency to visualize uncertainty.
But communicating doubtfulness tin endure quite important. Maybe ane of these ideas could piece of occupation for you.