https://www.r-bloggers.com/multiple-regression-part-2-diagnostics/
http://www.r-exercises.com/2017/01/26/multiple-regression-part-2-diagnostics-solutions/
Sumber http://engdashboard.blogspot.com/
http://www.r-exercises.com/2017/01/26/multiple-regression-part-2-diagnostics-solutions/
ultiple Regression is ane of the most widely used methods inwards statistical modelling. However, despite its many benefits, it is oftentimes used without checking the underlying assumptions. This tin dismiss Pb to results which tin dismiss endure misleading or fifty-fifty completely wrong. Therefore, applying diagnostics to respect whatever rigid violations of the assumptions is important. In the exercises below nosotros encompass some fabric on multiple regression diagnostics inwards R.
Answers to the exercises are available here.
If you lot obtain a dissimilar (correct) answer than those listed on the solutions page, delight experience gratis to post service your answer equally a comment on that page.
Multiple Regression (Part 1) tin dismiss endure found here.
We volition endure using the dataset
state.x77
, which is part of the state
datasets available in R
. (Additional information close the dataset tin dismiss endure obtained past times running help(state.x77)
.) Exercise 1
a. Load the
b. Convert the
c. Rename the
d. Produce the correlation matrix.
e. Create a scatterplot matrix for the variables
a. Load the
state
datasets.b. Convert the
state.x77
dataset to a dataframe.c. Rename the
Life Exp
variable to Life.Exp
, and HS Grad
to HS.Grad
. (This avoids problems amongst referring to these variables when specifying a model.)d. Produce the correlation matrix.
e. Create a scatterplot matrix for the variables
Life.Exp
, HS.Grad
, Murder
, and Frost
. Exercise 2
a. Fit the model with
b. Obtain the residuals.
c. Obtain the fitted values.
a. Fit the model with
Life.Exp
as subject variable, and HS.Grad
and Murder
as predictors.b. Obtain the residuals.
c. Obtain the fitted values.
Exercise 3
a. Create a remainder plot (residuals vs. fitted values).
b. Create the same remainder plot using the
a. Create a remainder plot (residuals vs. fitted values).
b. Create the same remainder plot using the
plot
command on the lm
object from Exercise 2. Learn more about multiple linear regression inwards the online courses Linear regression inwards R for Data Scientists, Statistics amongst R – advanced level, and Linear Regression in addition to Modeling.
Exercise 4
Create plots of the residuals vs. each of the predictor variables.
Create plots of the residuals vs. each of the predictor variables.
Exercise 5
a. Create a Normality plot.
b. Create the same plot using the
a. Create a Normality plot.
b. Create the same plot using the
plot
command on the lm
object from Exercise 2. Exercise 6
a. Obtain the studentized residuals.
b. Does in that location seem to endure whatever outliers?
a. Obtain the studentized residuals.
b. Does in that location seem to endure whatever outliers?
Exercise 7
a. Obtain the leverage value for each observation in addition to plot them.
b. Obtain the conventional threshold for leverage values. Are whatever observations influential?
a. Obtain the leverage value for each observation in addition to plot them.
b. Obtain the conventional threshold for leverage values. Are whatever observations influential?
Exercise 8
a. Obtain DFFITS values.
b. Obtain the conventional threshold. Are whatever observations influential?
c. Obtain DFBETAS values.
d. Obtain the conventional threshold. Are whatever observations influential?
a. Obtain DFFITS values.
b. Obtain the conventional threshold. Are whatever observations influential?
c. Obtain DFBETAS values.
d. Obtain the conventional threshold. Are whatever observations influential?
Exercise 9
a. Obtain Cook’s distance values in addition to plot them.
b. Obtain the same plot using the
c. Obtain the threshold value. Are whatever observations influential?
a. Obtain Cook’s distance values in addition to plot them.
b. Obtain the same plot using the
plot
command on the lm
object from Exercise 2.c. Obtain the threshold value. Are whatever observations influential?
Exercise 10
Create the Influence Plot using a operate from the
_____________________________________________________________
Create the Influence Plot using a operate from the
car
package._____________________________________________________________
Below are the solutions to these exercises on Multiple Regression (part 2).
#################### # # # Exercise 1 # # # #################### #a. data(state) #b. state77 <- as.data.frame(state.x77) #c. names(state77)[4] <- "Life.Exp" names(state77)[6] <- "HS.Grad" #d. round(cor(state77),3) #displays correlations to three decimal places
## Population Income Illiteracy Life.Exp Murder HS.Grad Frost ## Population 1.000 0.208 0.108 -0.068 0.344 -0.098 -0.332 ## Income 0.208 1.000 -0.437 0.340 -0.230 0.620 0.226 ## Illiteracy 0.108 -0.437 1.000 -0.588 0.703 -0.657 -0.672 ## Life.Exp -0.068 0.340 -0.588 1.000 -0.781 0.582 0.262 ## Murder 0.344 -0.230 0.703 -0.781 1.000 -0.488 -0.539 ## HS.Grad -0.098 0.620 -0.657 0.582 -0.488 1.000 0.367 ## Frost -0.332 0.226 -0.672 0.262 -0.539 0.367 1.000 ## Area 0.023 0.363 0.077 -0.107 0.228 0.334 0.059 ## Area ## Population 0.023 ## Income 0.363 ## Illiteracy 0.077 ## Life.Exp -0.107 ## Murder 0.228 ## HS.Grad 0.334 ## Frost 0.059 ## Area 1.000
#e. pairs( Life.Exp + HS.Grad + Murder + Frost, data=state77, gap=0)
#################### # # # Exercise 2 # # # #################### #a. model <- lm(Life.Exp HS.Grad + Murder, data=state77) summary(model)
## ## Call: ## lm(formula = Life.Exp HS.Grad + Murder, information = state77) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.66758 -0.41801 0.05602 0.55913 2.05625 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 70.29708 1.01567 69.213 < 2e-16 *** ## HS.Grad 0.04389 0.01613 2.721 0.00909 ** ## Murder -0.23709 0.03529 -6.719 2.18e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual criterion error: 0.7959 on 47 degrees of liberty ## Multiple R-squared: 0.6628, Adjusted R-squared: 0.6485 ## F-statistic: 46.2 on 2 in addition to 47 DF, p-value: 8.016e-12
#b. resids <- model$residuals #c. fitted <- model$fitted.values #################### # # # Exercise three # # # #################### #a. plot(fitted,resids,main="Residual Plot",xlab="Fitted Values",ylab="Residuals") abline(h=0,col="red")
#b. plot(model,which=1)
#################### # # # Exercise four # # # #################### par(mfrow=c(1,2)) # depict the 2 plots side past times side plot(state77$HS.Grad,resids,main="Residuals vs. HS.Grad",xlab="HS.Grad",ylab="Residuals") abline(h=0,col="red") plot(state77$Murder,resids,main="Residuals vs. Murder",xlab="Murder",ylab="Residuals") abline(h=0,col="red")
par(mfrow=c(1,1)) # restore to the default #################### # # # Exercise v # # # #################### #a. qqnorm(resids,ylab="Residuals") qqline(resids)
#b. plot(model,which=2)
#################### # # # Exercise six # # # #################### #a. stzed <- rstudent(model) #b. stzed[abs(stzed) > 2]
## Hawaii Maine ## 2.835488 -2.249583
#################### # # # Exercise seven # # # #################### #a. lever <- hat(model.matrix(model)) plot(lever)
#b. #obtain the threshold thresh2 <- 2*length(model$coefficients)/length(lever) #print leverage values higher upwardly threshold lever[lever > thresh2]
## [1] 0.1728282 0.1571342
#print corresponding dry soil names rownames(state77)[which(lever > thresh2)]
## [1] "Alaska" "Nevada"
#################### # # # Exercise 8 # # # #################### #a. dffits1 <- dffits(model) #b. thresh3 <- 2*sqrt(length(model$coefficients)/length(dffits1)) dffits1[dffits1 > thresh3]
## Hawaii ## 0.6182642
#c. dfbetas1 <- dfbetas(model) #d. thresh4 <- 2/sqrt(length(dfbetas1[,1])) dfbetas1[dfbetas1[,1] > thresh4,1] #for intercept
## Alaska Nevada ## 0.7036719 0.7400875
dfbetas1[dfbetas1[,2] > thresh4,2] #for HS.Grad
## California Hawaii South Carolina West Virginia ## 0.3993425 0.4430609 0.3855697 0.3613108
dfbetas1[dfbetas1[,3] > thresh4,3] #for Murder
## California Maine Texas ## 0.3491024 0.4441167 0.3038958
#################### # # # Exercise ix # # # #################### #a. cooksd <- cooks.distance(model) plot(cooksd,ylab="Cook's Distance")
#b. plot(model,which=4)
#c. thresh <- 4/length(resids) cooksd[cooksd > thresh]
## Alaska Hawaii Maine Nevada South Carolina ## 0.20282640 0.11081779 0.09477421 0.22879279 0.09423789
#################### # # # Exercise 10 # # # #################### library(car) influencePlot(model, main="Influence Plot")
## StudRes Hat CookD ## Alaska -1.743144 0.17282816 0.2028264 ## Hawaii 2.835488 0.04538585 0.1108178 ## Nevada -1.977284 0.15713419 0.2287928