Branches of mechanical engineering: Multiple Regression (Part 2) – Diagnostics + Solutions - R

Multiple Regression is one of the most widely used methods in statistical modelling. However, despite its many benefits, it is often used without checking the underlying assumptions. This can lead to results which can be misleading or even completely wrong. Therefore, applying diagnostics to respect any rigid violations of the assumptions is important. In the exercises below we cover some material on multiple regression diagnostics in R.
Answers to the exercises are available here.
If you lot obtain a dissimilar (correct) answer than those listed on the solutions page, delight experience gratis to post service your answer equally a comment on that page.
Multiple Regression (Part 1) tin dismiss endure found here.
We volition endure using the dataset state.x77, which is part of the state datasets available in R. (Additional information close the dataset tin dismiss endure obtained past times running help(state.x77).)
Exercise 1
a. Load the state datasets.
b. Convert the state.x77 dataset to a dataframe.
c. Rename the Life Exp variable to Life.Exp, and HS Grad to HS.Grad. (This avoids problems amongst referring to these variables when specifying a model.)
d. Produce the correlation matrix.
e. Create a scatterplot matrix for the variables Life.ExpHS.GradMurder, and Frost.
Exercise 2
a. Fit the model with Life.Exp as subject variable, and HS.Grad and Murder as predictors.
b. Obtain the residuals.
c. Obtain the fitted values.
Exercise 3
a. Create a remainder plot (residuals vs. fitted values).
b. Create the same remainder plot using the plot command on the lm object from Exercise 2.

Exercise 4
Create plots of the residuals vs. each of the predictor variables.
Exercise 5
a. Create a Normality plot.
b. Create the same plot using the plot command on the lm object from Exercise 2.
Exercise 6
a. Obtain the studentized residuals.
b. Does in that location seem to endure whatever outliers?
Exercise 7
a. Obtain the leverage value for each observation in addition to plot them.
b. Obtain the conventional threshold for leverage values. Are whatever observations influential?
Exercise 8
a. Obtain DFFITS values.
b. Obtain the conventional threshold. Are whatever observations influential?
c. Obtain DFBETAS values.
d. Obtain the conventional threshold. Are whatever observations influential?
Exercise 9
a. Obtain Cook’s distance values in addition to plot them.
b. Obtain the same plot using the plot command on the lm object from Exercise 2.
c. Obtain the threshold value. Are whatever observations influential?
Exercise 10
Create the Influence Plot using a operate from the car package.


Below are the solutions to these exercises on Multiple Regression (part 2).
#################### #                  # #    Exercise 1    # #                  # #################### #a.  data(state)  #b.  state77 <-  #c. names(state77)[4] <- "Life.Exp" names(state77)[6] <- "HS.Grad"  #d. round(cor(state77),3) #displays correlations to three decimal places 
##            Population Income Illiteracy Life.Exp Murder HS.Grad  Frost ## Population      1.000  0.208      0.108   -0.068  0.344  -0.098 -0.332 ## Income          0.208  1.000     -0.437    0.340 -0.230   0.620  0.226 ## Illiteracy      0.108 -0.437      1.000   -0.588  0.703  -0.657 -0.672 ## Life.Exp       -0.068  0.340     -0.588    1.000 -0.781   0.582  0.262 ## Murder          0.344 -0.230      0.703   -0.781  1.000  -0.488 -0.539 ## HS.Grad        -0.098  0.620     -0.657    0.582 -0.488   1.000  0.367 ## Frost          -0.332  0.226     -0.672    0.262 -0.539   0.367  1.000 ## Area            0.023  0.363      0.077   -0.107  0.228   0.334  0.059 ##              Area ## Population  0.023 ## Income      0.363 ## Illiteracy  0.077 ## Life.Exp   -0.107 ## Murder      0.228 ## HS.Grad     0.334 ## Frost       0.059 ## Area        1.000 
#e. pairs(  Life.Exp + HS.Grad + Murder + Frost, data=state77, gap=0) 
 Multiple Regression (Part 2) – Diagnostics + Solutions - R
#################### #                  # #    Exercise 2    # #                  # #################### #a. model <- lm(Life.Exp   HS.Grad + Murder, data=state77) summary(model) 
##  ## Call: ## lm(formula = Life.Exp   HS.Grad + Murder, information = state77) ##  ## Residuals: ##      Min       1Q   Median       3Q      Max  ## -1.66758 -0.41801  0.05602  0.55913  2.05625  ##  ## Coefficients: ##             Estimate Std. Error t value Pr(>|t|)     ## (Intercept) 70.29708    1.01567  69.213  < 2e-16 *** ## HS.Grad      0.04389    0.01613   2.721  0.00909 **  ## Murder      -0.23709    0.03529  -6.719 2.18e-08 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## Residual criterion error: 0.7959 on 47 degrees of liberty ## Multiple R-squared:  0.6628, Adjusted R-squared:  0.6485  ## F-statistic:  46.2 on 2 in addition to 47 DF,  p-value: 8.016e-12 
#b. resids <- model$residuals  #c.  fitted <- model$fitted.values  #################### #                  # #    Exercise three    # #                  # #################### #a. plot(fitted,resids,main="Residual Plot",xlab="Fitted Values",ylab="Residuals") abline(h=0,col="red") 
 Residual Plot
#b. plot(model,which=1) 
 Residual Plot using plot command
#################### #                  # #    Exercise four    # #                  # #################### par(mfrow=c(1,2)) # depict the 2 plots side past times side  plot(state77$HS.Grad,resids,main="Residuals vs. HS.Grad",xlab="HS.Grad",ylab="Residuals") abline(h=0,col="red") plot(state77$Murder,resids,main="Residuals vs. Murder",xlab="Murder",ylab="Residuals") abline(h=0,col="red") 
 Residuals vs. predictor variables
par(mfrow=c(1,1)) # restore to the default  #################### #                  # #    Exercise v    # #                  # #################### #a. qqnorm(resids,ylab="Residuals") qqline(resids) 
 Normal Q-Q Plot
#b. plot(model,which=2) 
 Normal Q-Q Plot using plot command
#################### #                  # #    Exercise six    # #                  # #################### #a.   stzed <- rstudent(model)  #b.   stzed[abs(stzed) > 2] 
##    Hawaii     Maine  ##  2.835488 -2.249583 
#################### #                  # #    Exercise seven    # #                  # #################### #a. lever <- hat(model.matrix(model)) plot(lever) 
 Leverage plot
#b. #obtain the threshold thresh2 <- 2*length(model$coefficients)/length(lever)  #print leverage values higher upwardly threshold lever[lever > thresh2] 
## [1] 0.1728282 0.1571342 
#print corresponding dry soil names rownames(state77)[which(lever > thresh2)] 
## [1] "Alaska" "Nevada" 
#################### #                  # #    Exercise 8    # #                  # #################### #a. dffits1 <- dffits(model)  #b. thresh3 <- 2*sqrt(length(model$coefficients)/length(dffits1)) dffits1[dffits1 > thresh3] 
##    Hawaii  ## 0.6182642 
#c. dfbetas1 <- dfbetas(model)  #d. thresh4 <- 2/sqrt(length(dfbetas1[,1])) dfbetas1[dfbetas1[,1] > thresh4,1]  #for intercept 
##    Alaska    Nevada  ## 0.7036719 0.7400875 
dfbetas1[dfbetas1[,2] > thresh4,2]  #for HS.Grad 
##     California         Hawaii South Carolina  West Virginia  ##      0.3993425      0.4430609      0.3855697      0.3613108 
dfbetas1[dfbetas1[,3] > thresh4,3]  #for Murder 
## California      Maine      Texas  ##  0.3491024  0.4441167  0.3038958 
#################### #                  # #    Exercise ix    # #                  # #################### #a. cooksd <- cooks.distance(model) plot(cooksd,ylab="Cook's Distance") 
 Cook's Distance plot
#b.  plot(model,which=4) 
 Cook's Distance using plot command
#c. thresh <- 4/length(resids) cooksd[cooksd > thresh] 
##         Alaska         Hawaii          Maine         Nevada South Carolina  ##     0.20282640     0.11081779     0.09477421     0.22879279     0.09423789 
#################### #                  # #    Exercise 10   # #                  # #################### library(car) influencePlot(model, main="Influence Plot") 
 Influence Plot
##          StudRes        Hat     CookD ## Alaska -1.743144 0.17282816 0.2028264 ## Hawaii  2.835488 0.04538585 0.1108178 ## Nevada -1.977284 0.15713419 0.2287928 


