Branches of mechanical engineering: Generalized Linear Functions (Beginners) +Solutions - R





 by  Leave a Comment

On this laid of exercises, nosotros are going to exercise the lm and glm functions to perform several generalized linear models on ane dataset.
Since this is a basic laid of exercises nosotros volition possess got a closer await at the arguments of these functions together with how to possess got wages of the output of each business office then nosotros tin honour a model that fits our data.
Before starting this laid of exercises, I strongly propose you lot await at the R Documentation of lm and glm.
Note: This laid of exercises assume that you lot possess got a basic agreement of generalized linear models.
Answers to the exercises are available here.
If you lot obtained a dissimilar (correct) respond than those listed on the solutions page, delight experience gratis to post service your respond every bit a comment on that page.
The dataset nosotros volition live using contains information from passengers of the Titanic including if they survived or not.
To obtain the information run these lines of code.

if (!'titanic' %in% installed.packages()) install.packages('titanic')
library(titanic)
DATA <- titanic_train[,-c(1,4,9,11)]
Exercise 1
Linear regression
1. Use DATA to do a linear model using the function lm with the variables Age together with Fare every bit independent variables together with Survived every bit the independent one. Save the regression inwards an object called lm_reg
2. Use the function glm to perform the same chore together with relieve the regression inwards an object called glm_reg
Exercise 2
If you lot impress whatever of the previous objects you lot volition realize that there’s non much information close the functioning of the models, fortunately summary is a cracking business office to honour out to a greater extent than close whatever statistical model you lot preform to a dataset. Depending on the model summary will hit dissimilar outputs.
  • Apply summary to lm_reg and to glm_reg. You volition honour a slight departure betwixt both of the outputs, that is because glm is to a greater extent than flexible than lm.
Exercise 3
So far nosotros possess got been assuming (incorrectly) that the theme variable (Survived) follows a normal distribution together with that’s why nosotros possess got been performing a linear regression. Obviously Survived follows a binomial distribution, at that topographic point are solely 2 options either the rider survived (1) or the rider wasn’t that lucky together with he died (0). Since the information has a binomial distribution nosotros should perform a logistic regression, to do this exercise the function glm to perform a logistic regression using Age and Fare as independent variables together with relieve it inwards an object called bin_model. Hint: Define the value of the argument family properly.
Exercise 4
Inside the household unit of measurement attribute you lot tin ever specify a detail link, inwards example you lot don’t a default link volition live associated depending on the household unit of measurement you lot chose.
1. To honour out the default link associated to a certainly family, you lot tin write the household unit of measurement elevate followed past times a parenthesis (Ex. gaussian(). Find the default link associated to the binomial family.
2. Create a probit model amongst the same variables used in bin_model and relieve it inwards an object called bin_probit_model.
Exercise 5
Findind the right model requires to compare dissimilar models together with select the best, although at that topographic point are many functioning measures, for directly nosotros volition exercise the AIC as our mensurate (smaller AIC are better). This agency that bin_model is ameliorate than bin_probit_model, then let’s continue working with bin_model.
Until directly intercept variable has been component of the models. Create a logistic regression amongst the same variables but amongst no intercept.
Exercise 6
Impute data. If you lot run the summary function to whatever of the previous models you lot volition honour out that 177 observations possess got been deleted due to missingness. This happens because the glm function has every bit default argument na.acton ="na.omit". This brand easier to run a model amongst messier data, but that is non ever great. You desire to possess got total command an agreement of what does the business office is doing.
1. There are some missing values in age, supervene upon this values amongst the median.
2. Update the glm_model with the updated data, specify na.action='na.fail' This volition assure us that the dataset has no missing values, otherwise it volition exhibit an error.
Learn more about evaluating dissimilar statistical models inwards the online courses Linear regression inwards R for Data Scientists and Structural equation modeling (SEM) amongst lavaan. These courses embrace dissimilar statistical models that tin assistance you lot select the right pattern for your solution.
Exercise 7
Add polynomial independent variables. Some variables possess got a quadratic interaction betwixt them together with the theme variable, this tin live solved past times specifying inwards the formula of the model a quadratic interaction.
Add a quadratic interaction for the variable Fare into the electrical flow model, specified in glm_model
Exercise 8
Add categorical variables.  Add Sex as an independent variable  into the electrical flow model specified in glm_model. Note that Sex is non a numeric variable.
Exercise 9
Now that nosotros possess got flora a skilful model that fits our data, then it’s fourth dimension to exercise the predict function to honour how skilful the model predicts inwards our ain data. Use the function predict to honour the prediction of the model in DATA and relieve it in Pred.default
Exercise 10
Pred.default shows the predicted values nether the link transformation, inwards this example logit. This is non easily interpretable, to prepare this occupation nosotros tin specify the type of prediction nosotros want.
    Obtain the predictions every bit probability values.
    Exta: What’s the per centum accuracy of this model if nosotros assigned every bit died (0) if the predicted probability is less than 0.5 together with survived (1) otherwise?
_______________________________________________


Below are the solutions to these exercises on generalized linear models.
if (!'titanic' %in% installed.packages()) install.packages('titanic') library(titanic) 
## Warning: bundle 'titanic' was built nether R version 3.3.3 
DATA <- titanic_train[,-c(1,4,9,11)]  #################### #                  # #    Exercise 1    # #                  # ####################   (lm_reg <- lm(formula = Survived   Age + Fare, data = DATA)) 
##  ## Call: ## lm(formula = Survived   Age + Fare, information = DATA) ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    0.420973    -0.003517     0.002583 
(glm_model <- glm(formula = Survived   Age + Fare, data = DATA, family = gaussian)) 
##  ## Call:  glm(formula = Survived   Age + Fare, household unit of measurement = gaussian, information = DATA) ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    0.420973    -0.003517     0.002583   ##  ## Degrees of Freedom: 713 Total (i.e. Null);  711 Residual ##   (177 observations deleted due to missingness) ## Null Deviance:     172.2  ## Residual Deviance: 158  AIC: 957.2 
#################### #                  # #    Exercise 2    # #                  # ####################  summary(lm_reg) 
##  ## Call: ## lm(formula = Survived   Age + Fare, information = DATA) ##  ## Residuals: ##     Min      1Q  Median      3Q     Max  ## -1.0336 -0.3675 -0.3110  0.5563  0.7829  ##  ## Coefficients: ##               Estimate Std. Error t value Pr(>|t|)     ## (Intercept)  0.4209734  0.0409896  10.270  < 2e-16 *** ## Age         -0.0035166  0.0012209  -2.880  0.00409 **  ## Fare         0.0025834  0.0003351   7.708  4.3e-14 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## Residual criterion error: 0.4714 on 711 degrees of liberty ##   (177 observations deleted due to missingness) ## Multiple R-squared:  0.08263, Adjusted R-squared:  0.08005  ## F-statistic: 32.02 on 2 together with 711 DF,  p-value: 4.837e-14 
summary(glm_model) 
##  ## Call: ## glm(formula = Survived   Age + Fare, household unit of measurement = gaussian, information = DATA) ##  ## Deviance Residuals:  ##     Min       1Q   Median       3Q      Max   ## -1.0336  -0.3675  -0.3110   0.5563   0.7829   ##  ## Coefficients: ##               Estimate Std. Error t value Pr(>|t|)     ## (Intercept)  0.4209734  0.0409896  10.270  < 2e-16 *** ## Age         -0.0035166  0.0012209  -2.880  0.00409 **  ## Fare         0.0025834  0.0003351   7.708  4.3e-14 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (Dispersion parameter for gaussian household unit of measurement taken to live 0.2221983) ##  ##     Null deviance: 172.21  on 713  degrees of liberty ## Residual deviance: 157.98  on 711  degrees of liberty ##   (177 observations deleted due to missingness) ## AIC: 957.25 ##  ## Number of Fisher Scoring iterations: 2 
#################### #                  # #    Exercise iii    # #                  # ####################   (bin_model <- glm(formula = Survived   Age + Fare, data = DATA, family = binomial)) 
##  ## Call:  glm(formula = Survived   Age + Fare, household unit of measurement = binomial, information = DATA) ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    -0.41706     -0.01758      0.01726   ##  ## Degrees of Freedom: 713 Total (i.e. Null);  711 Residual ##   (177 observations deleted due to missingness) ## Null Deviance:     964.5  ## Residual Deviance: 891.3  AIC: 897.3 
#################### #                  # #    Exercise iv    # #                  # ####################  binomial() 
##  ## Family: binomial  ## Link function: logit 
(bin_probit_model <- glm(formula = Survived   Age + Fare, data = DATA, family = binomial(link = probit))) 
##  ## Call:  glm(formula = Survived   Age + Fare, household unit of measurement = binomial(link = probit),  ##     information = DATA) ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    -0.24598     -0.01028      0.00933   ##  ## Degrees of Freedom: 713 Total (i.e. Null);  711 Residual ##   (177 observations deleted due to missingness) ## Null Deviance:     964.5  ## Residual Deviance: 894.4  AIC: 900.4 
#################### #                  # #    Exercise v    # #                  # ####################   (bin_model_no_int <- glm(formula =  Survived   0 + Age + Fare, data = DATA, family = binomial(link = logit))) 
##  ## Call:  glm(formula = Survived   0 + Age + Fare, household unit of measurement = binomial(link = logit),  ##     information = DATA) ##  ## Coefficients: ##      Age      Fare   ## -0.02805   0.01594   ##  ## Degrees of Freedom: 714 Total (i.e. Null);  712 Residual ##   (177 observations deleted due to missingness) ## Null Deviance:     989.8  ## Residual Deviance: 896.4  AIC: 900.4 
#################### #                  # #    Exercise half dozen    # #                  # ####################  (bin_model <- glm(formula =  Survived   Age + Fare, data = DATA, family = binomial(link = logit), na.action = 'na.omit')) 
##  ## Call:  glm(formula = Survived   Age + Fare, household unit of measurement = binomial(link = logit),  ##     information = DATA, na.action = "na.omit") ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    -0.41706     -0.01758      0.01726   ##  ## Degrees of Freedom: 713 Total (i.e. Null);  711 Residual ##   (177 observations deleted due to missingness) ## Null Deviance:     964.5  ## Residual Deviance: 891.3  AIC: 897.3 
Impute <- median(DATA$Age, na.rm = TRUE) DATA$Age[is.na(DATA$Age)] <- Impute  (bin_model_Impute <- glm(formula =  Survived   Age + Fare, data = DATA, family = binomial(link = logit), na.action = 'na.fail')) 
##  ## Call:  glm(formula = Survived   Age + Fare, household unit of measurement = binomial(link = logit),  ##     information = DATA, na.action = "na.fail") ##  ## Coefficients: ## (Intercept)          Age         Fare   ##    -0.47997     -0.01682      0.01620   ##  ## Degrees of Freedom: 890 Total (i.e. Null);  888 Residual ## Null Deviance:     1187  ## Residual Deviance: 1109  AIC: 1115 
#################### #                  # #    Exercise vii    # #                  # ####################  (bin_model<- glm(formula =  Survived   Age + poly(Fare,2)  , data = DATA, family = binomial(link = logit))) 
##  ## Call:  glm(formula = Survived   Age + poly(Fare, 2), household unit of measurement = binomial(link = logit),  ##     information = DATA) ##  ## Coefficients: ##    (Intercept)             Age  poly(Fare, 2)1  poly(Fare, 2)2   ##        0.05118        -0.01812        18.41909       -10.24135   ##  ## Degrees of Freedom: 890 Total (i.e. Null);  887 Residual ## Null Deviance:     1187  ## Residual Deviance: 1097  AIC: 1105 
summary(bin_model) 
##  ## Call: ## glm(formula = Survived   Age + poly(Fare, 2), household unit of measurement = binomial(link = logit),  ##     information = DATA) ##  ## Deviance Residuals:  ##     Min       1Q   Median       3Q      Max   ## -2.3643  -0.8806  -0.8030   1.2209   1.8474   ##  ## Coefficients: ##                  Estimate Std. Error z value Pr(>|z|)     ## (Intercept)      0.051181   0.180492   0.284  0.77674     ## Age             -0.018117   0.005714  -3.171  0.00152 **  ## poly(Fare, 2)1  18.419094   2.554456   7.211 5.57e-13 *** ## poly(Fare, 2)2 -10.241350   2.229437  -4.594 4.35e-06 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (Dispersion parameter for binomial household unit of measurement taken to live 1) ##  ##     Null deviance: 1186.7  on 890  degrees of liberty ## Residual deviance: 1096.6  on 887  degrees of liberty ## AIC: 1104.6 ##  ## Number of Fisher Scoring iterations: iv 
#################### #                  # #    Exercise 8    # #                  # ####################  (bin_model<- glm(formula =  Survived   Age +  poly(Fare,2) + as.factor(Sex)  , data = DATA, family = binomial(link = logit))) 
##  ## Call:  glm(formula = Survived   Age + poly(Fare, 2) + as.factor(Sex),  ##     household unit of measurement = binomial(link = logit), information = DATA) ##  ## Coefficients: ##        (Intercept)                 Age      poly(Fare, 2)1   ##            1.28411            -0.01077            15.37627   ##     poly(Fare, 2)2  as.factor(Sex)male   ##           -6.59275            -2.37887   ##  ## Degrees of Freedom: 890 Total (i.e. Null);  886 Residual ## Null Deviance:     1187  ## Residual Deviance: 877  AIC: 887 
summary(bin_model) 
##  ## Call: ## glm(formula = Survived   Age + poly(Fare, 2) + as.factor(Sex),  ##     household unit of measurement = binomial(link = logit), information = DATA) ##  ## Deviance Residuals:  ##     Min       1Q   Median       3Q      Max   ## -2.4390  -0.6053  -0.5619   0.7994   2.0913   ##  ## Coefficients: ##                     Estimate Std. Error z value Pr(>|z|)     ## (Intercept)         1.284114   0.225901   5.684 1.31e-08 *** ## Age                -0.010767   0.006558  -1.642  0.10066     ## poly(Fare, 2)1     15.376271   2.770311   5.550 2.85e-08 *** ## poly(Fare, 2)2     -6.592748   2.521885  -2.614  0.00894 **  ## as.factor(Sex)male -2.378874   0.171822 -13.845  < 2e-16 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (Dispersion parameter for binomial household unit of measurement taken to live 1) ##  ##     Null deviance: 1186.66  on 890  degrees of liberty ## Residual deviance:  877.04  on 886  degrees of liberty ## AIC: 887.04 ##  ## Number of Fisher Scoring iterations: iv 
#################### #                  # #    Exercise nine    # #                  # ####################  DATA$Pred.default <- predict(bin_model)  #################### #                  # #    Exercise 10   # #                  # ####################  DATA$Prob <- predict(bin_model, type = 'response') DATA$Pred <- ifelse(DATA$Prob<.5, 0,1)  sum(DATA$Pred==DATA$Survived) / nrow(DATA) 
## [1] 0.7822671
 
http://www.r-exercises.com/2017/09/16/generalized-linear-functions-beginners/
http://www.r-exercises.com/2017/09/16/generalized-linear-models-solutionbeginners/
 

Sumber http://engdashboard.blogspot.com/

Jangan sampai ketinggalan postingan-postingan terbaik dari Branches of mechanical engineering: Generalized Linear Functions (Beginners) +Solutions - R. Berlangganan melalui email sekarang juga:

Bali Attractions

BACA JUGA LAINNYA:

Bali Attractions