R Interview Questions
R is 1 of the most pop programming linguistic communication for performing statistical analysis too predictive modeling. Many recent surveys too studies claimed "R" holds a expert pct of marketplace seat part inwards analytics industry. Data scientist role to a greater extent than often than non requires a candidate to know R/Python programming language. People who know R programming linguistic communication are to a greater extent than often than non paid to a greater extent than than python too SAS programmers. In damage of advancement inwards R software, it has improved a lot inwards the recent years. It supports parallel computing too integration alongside large information technologies.
While using cbind() function, brand certain the number of rows must move equal in both the datasets. While using rbind() function, brand certain both the number too names of columns must move same. If names of columns would non move same, incorrect information would move appended to columns or records mightiness acquire missing.
12. How to combine information yesteryear rows when dissimilar issue of columns?
When the issue of columns inwards datasets are non equal, rbind() role doesn't work to combine information yesteryear rows. For example, nosotros bring 2 information frames df too df2. The information frame df has 2 columns too df2 has alone 1 variable. See the code below -
Tutorial : dplyr Tutorial alongside fifty Examples
13. What are valid variable names inwards R?
A valid variable mention consists of letters, numbers too the dot or underline characters. H5N1 variable mention tin start alongside either a missive of the alphabet or the dot followed yesteryear a character (not number).
In the illustration below, nosotros are renaming variable var1 to variable1.
16. What is the exercise of which() role inwards R?
The which() function returns the seat of elements of a logical vector that are TRUE. In the illustration below, nosotros are figuring out the row issue wherein the maximum value of a variable x is recorded.
18. How to calculate max value for rows?
20. Does the next code work?
This code works ifelse(is.na(df$var1), 0,1)
21. What would move the terminal value of x after running the next program?
x = 3
mult <- function(j)
{
x = j * 2
return(x)
}
mult(2)
[1] 4
Answer : The value of 'x' volition stay 3. See the output shown inwards the icon below-
It is because x is defined exterior function. If yous desire to modify the value of x afterward running the function, yous tin exercise the next program:
If yous desire to modify the default unmarried infinite separator, yous tin add together sep="," keyword to include comma equally a separator.
24. How to extract get-go 3 characters from a word
The substr() role is used to extract strings inwards a graphic symbol vector. The syntax of substr role is substr(character_vector, starting_position, end_position)
25. How to extract in conclusion mention from total name
The in conclusion mention is the terminate string of the name. For example, Jhonson is the in conclusion mention of "Dave,Jon,Jhonson".
26. How to take away leading too trailing spaces
The trimws() function is used to take away leading too trailing spaces.
27. How to generate random numbers betwixt 1 too 100
The runif() role is used to generate random numbers.
28. How to apply LEFT JOIN inwards R?
LEFT JOIN implies keeping all rows from the left tabular array (data frame) alongside the matches rows from the correct table. In the merge() function, all.x=TRUE denotes left join.
Left Join alongside dply package
30. Unique rows mutual to both the datasets
First, create 2 sample information frames
df1=data.frame(ID=c(1:5), Score=c(50:54))
df2=data.frame(ID=c(3,5,7:9), Score=c(52,60:63))
31. How to mensurate execution fourth dimension of a programme inwards R?
There are multiple ways to mensurate running fourth dimension of code. Some oftentimes used methods are listed below -
R Base Method
32. Which packet is to a greater extent than often than non used for fast information manipulation on large datasets?
The package data.table performs fast information manipulation on large datasets. See the comparing betwixt dplyr too data.table.
Result : data.table packet took 0.04 seconds. whereas dplyr packet took 0.07 seconds. So, data.table is approx. 40% faster than dplyr. Since the dataset used inwards the illustration is of medium size, at that spot is no noticeable deviation betwixt the two. As size of information grows, the deviation of execution fourth dimension gets bigger.
33. How to read large CSV file inwards R?
We tin use fread() function of data.table package.
We tin also use read.big.matrix() function of bigmemory package.
34. What is the deviation betwixt the next 2 programs ?
1. temp = data.frame(v1<-c(1:10),v2<-c(5:14))
2. temp = data.frame(v1=c(1:10),v2=c(5:14))
36. What are the diverse sorting algorithms inwards R?
Major 5 sorting algorithms :
40. How to salve everything inwards R session
41. How R handles missing values?
Missing values are represented yesteryear working capital missive of the alphabet NA.
To create a novel information without whatever missing value, yous tin exercise the code below :
42. How to take away duplicate values yesteryear a column
Suppose yous bring a information consisting of 25 records. You are asked to take away duplicates based on a column. In the example, nosotros are eliminating duplicates yesteryear variable y.
43. Which packages are used for transposing information alongside R
The reshape2 too tidyr packages are most pop packages for reshaping information inwards R.
Explanation : Transpose Data
44. Calculate issue of hours, days, weeks, months too years betwixt 2 dates
Let's laid 2 dates :
45. How to add together 3 months to a date
46. Extract appointment too fourth dimension from timestamp
50. Extracting Numeric Variables
Data Science alongside R Interview Questions
The listing below contains most oftentimes asked interview questions for a role of information scientist. Most of the roles related to information scientific discipline or predictive modeling require candidate to move good conversant alongside R too know how to develop too validate predictive models alongside R.
51. Which role is used for edifice linear regression model?
The lm() role is used for plumbing equipment a linear regression model.
52. How to add together interaction inwards the linear regression model?
:An interaction tin move created using colon sign (:). For example, x1 too x2 are 2 predictors (independent variables). The interaction betwixt the variables tin move formed like x1:x2.
See the illustration below -
R is 1 of the most pop programming linguistic communication for performing statistical analysis too predictive modeling. Many recent surveys too studies claimed "R" holds a expert pct of marketplace seat part inwards analytics industry. Data scientist role to a greater extent than often than non requires a candidate to know R/Python programming language. People who know R programming linguistic communication are to a greater extent than often than non paid to a greater extent than than python too SAS programmers. In damage of advancement inwards R software, it has improved a lot inwards the recent years. It supports parallel computing too integration alongside large information technologies.
R Interview Questions too Answers |
The next is a listing of most oftentimes asked R Programming Interview Questions alongside detailed answer. It includes some basic, advanced or tricky questions related to R. Also it covers interview questions related to information scientific discipline alongside R.
It returns factor.
To create upward one's heed construction of an object, use str() function :
Example 2 :
If yous desire to include % of values inwards each group, yous tin shop the outcome inwards information frame using data.frame role too the calculate the column percent.
5. How to cheque the cumulative frequency distribution of a categorical variable
The cumsum function is used to calculate the cumulative total of a categorical variable.
If yous desire to run into the cumulative pct of values, run into the code below :
To meliorate the layout of histogram, yous tin exercise the code below
1. How to create upward one's heed information type of an object?
class() is used to create upward one's heed information type of an object. See the illustration below -
x <- factor(1:5)
class(x)
Object Class |
str(x) returns "Factor w/ 5 level"
Example 2 :
xx <- data.frame(var1=c(1:5))
class(xx)
It returns "data.frame".
str(xx) returns 'data.frame' : 5 obs. of 1 variable: $ var1: int
2. What is the exercise of mode() function?
It returns the storage way of an object.
x <- factor(1:5)
mode(x)
The higher upward way role returns numeric.
Mode Function |
x <- data.frame(var1=c(1:5))It returns list.
mode(x)
3. Which information construction is used to shop categorical variables?
R has a particular information construction called "factor" to shop categorical variables. It tells R that a variable is nominal or ordinal yesteryear making it a factor.
sex = c(1,2,1,2,1,2)
gender = factor(gender)
gender
4. How to cheque the frequency distribution of a categorical variable?
The table function is used to calculate the count of each categories of a categorical variable.
sex = factor(c("m","f","f","m","f","f"))
table(gender)
Output |
If yous desire to include % of values inwards each group, yous tin shop the outcome inwards information frame using data.frame role too the calculate the column percent.
t = data.frame(table(gender))
t$percent= round(t$Freq / sum(t$Freq)*100,2)
Frequency Distribution |
5. How to cheque the cumulative frequency distribution of a categorical variable
The cumsum function is used to calculate the cumulative total of a categorical variable.
sex = factor(c("m","f","f","m","f","f"))
x = table(gender)
cumsum(x)
Cumulative Sum |
If yous desire to run into the cumulative pct of values, run into the code below :
t = data.frame(table(gender))
t$cumfreq = cumsum(t$Freq)
t$cumpercent= round(t$cumfreq / sum(t$Freq)*100,2)
Cumulative Frequency Distribution |
6. How to make histogram
The hist function is used to make the histogram of a variable.
df = sample(1:100, 25)
hist(df, right=FALSE)
Produce Histogram alongside R |
colors = c("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan")
hist(df, right=FALSE, col=colors, main="Main Title ", xlab="X-Axis Title")
7. How to make bar graph
First calculate the frequency distribution with table function too and thence apply barplot function to make bar graph
mydata = sample(LETTERS[1:5],16,replace = TRUE)
mydata.count= table(mydata)
barplot(mydata.count)
To meliorate the layout of bar graph, yous tin exercise the code below:
colors = c("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan")
barplot(mydata.count, col=colors, main="Main Title ", xlab="X-Axis Title")
Bar Graph alongside R |
8. How to make Pie Chart
First calculate the frequency distribution with table function too and thence apply pie function to make pie chart.
If yous run this vector z <- x*y , what would move the output? What would move the length of z?
It returns 8 xv 12 alongside the alert message equally shown below. The length of z is 3 equally it has 3 elements.
Multiplication of vectors |
First Step : It performs multiplication of the get-go chemical ingredient of vector x i.e. 4 alongside get-go chemical ingredient of vector y i.e. 2 too the outcome is 8. In the second step, it multiplies second chemical ingredient of vector x i.e. 5 alongside 2nd chemical ingredient of vector b i.e. 3, too the outcome is 15. In the adjacent step, R multiplies get-go chemical ingredient of smaller vector (y) alongside in conclusion chemical ingredient of bigger vector x.
Suppose the vector x would incorporate 4 elements equally shown below :
x <- c(4,5,6,7)
y <- c(2,3)
x*y
It returns 8 xv 12 21. It plant similar this : (4*2) (5*3) (6*2) (7*3)
10. What are the dissimilar information structures R contain?
R contains primarily the next information structures :
- Vector
- Matrix
- Array
- List
- Data frame
- Factor
The get-go 3 information types (vector, matrix, array) are homogeneous in behavior. It agency all contents must move of the same type. The 4th too 5th information types (list, information frame) are heterogeneous in behavior. It implies they allow dissimilar types. And the factor information type is used to shop categorical variable.
11. How to combine information frames?
Let's laid upward 2 vectors for demonstration :
x = c(1:5)The cbind() function is used to combine information frame by columns.
y = c("m","f","f","m","f")
z=cbind(x,y)
cbind : Output |
The rbind() function is used to combine information frame by rows.
z = rbind(x,y)
rbind : Output |
While using cbind() function, brand certain the number of rows must move equal in both the datasets. While using rbind() function, brand certain both the number too names of columns must move same. If names of columns would non move same, incorrect information would move appended to columns or records mightiness acquire missing.
12. How to combine information yesteryear rows when dissimilar issue of columns?
When the issue of columns inwards datasets are non equal, rbind() role doesn't work to combine information yesteryear rows. For example, nosotros bring 2 information frames df too df2. The information frame df has 2 columns too df2 has alone 1 variable. See the code below -
df = data.frame(x = c(1:4), y = c("m","f","f","m"))The bind_rows() function from dplyr packet tin move used to combine information frames when issue of columns make non match.
df2 = data.frame(x = c(5:8))
library(dplyr)
combdf = bind_rows(df,df2)
Tutorial : dplyr Tutorial alongside fifty Examples
13. What are valid variable names inwards R?
A valid variable mention consists of letters, numbers too the dot or underline characters. H5N1 variable mention tin start alongside either a missive of the alphabet or the dot followed yesteryear a character (not number).
H5N1 variable mention such as .1var is non valid. But .var1 is valid.
H5N1 variable mention cannot bring reserved words. The reserved words are listed below -
14. What is the exercise of with() too by() functions? What are its alternatives?
Suppose yous bring a information frame equally shown below -
by() role inwards R
The by() role is equivalent to group yesteryear function in SQL. It is used to perform calculation yesteryear a factor or a categorical variable. In the illustration below, nosotros are computing hateful of variable var2 yesteryear a factor var1.
15. How to rename a variable?
if else repeat acre role for inwards adjacent breakH5N1 variable mention tin bring maximum to 10,000 bytes.
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_ NA_character_
14. What is the exercise of with() too by() functions? What are its alternatives?
Suppose yous bring a information frame equally shown below -
df=data.frame(x=c(1:6), y=c(1,2,4,6,8,12))You are asked to perform this calculation : (x+y) + (x-y) . Most of the R programmers write similar code below -
(df$x + df$y) + (df$x - df$y)Using with() function, yous tin refer your information frame too brand the higher upward code compact too simpler-
with(df, (x+y) + (x-y))The with() role is equivalent to piping operator inwards dplyr package. See the code below -
library(dplyr)
df %>% mutate((x+y) + (x-y))
by() role inwards R
The by() role is equivalent to group yesteryear function in SQL. It is used to perform calculation yesteryear a factor or a categorical variable. In the illustration below, nosotros are computing hateful of variable var2 yesteryear a factor var1.
df = data.frame(var1=factor(c(1,2,1,2,1,2)), var2=c(10:15))The group_by() function in dply packet tin perform the same task.
with(df, by(df, var1, function(x) mean(x$var2)))
library(dplyr)
df %>% group_by(var1)%>% summarise(mean(var2))
15. How to rename a variable?
In the illustration below, nosotros are renaming variable var1 to variable1.
df = data.frame(var1=c(1:5))The rename() function inwards dplyr packet tin also move used to rename a variable.
colnames(df)[colnames(df) == 'var1'] <- 'variable1'
library(dplyr)
df= rename(df, variable1=var1)
16. What is the exercise of which() role inwards R?
The which() function returns the seat of elements of a logical vector that are TRUE. In the illustration below, nosotros are figuring out the row issue wherein the maximum value of a variable x is recorded.
mydata=data.frame(x = c(1,3,10,5,7))
which(mydata$x==max(mydata$x))
It returns 3 equally 10 is the maximum value too it is at tertiary row inwards the variable x.
17. How to calculate get-go non-missing value inwards variables?
Suppose yous bring 3 variables X, Y too Z too yous demand to extract get-go non-missing value inwards each rows of these variables.
Suppose yous bring 3 variables X, Y too Z too yous demand to extract get-go non-missing value inwards each rows of these variables.
information = read.table(text="The coalesce() function inwards dplyr packet tin move used to make this task.
X Y Z
NA 1 5
3 NA 2
", header=TRUE)
library(dplyr)
data %>% mutate(var=coalesce(X,Y,Z))
COALESCE Function inwards R |
18. How to calculate max value for rows?
Let's create a sample information frame
dt1 = read.table(text="With apply() function, nosotros tin state R to apply the max role rowwise. The na,rm = TRUE is used to state R to ignore missing values acre calculating max value. If it is non used, it would furnish NA.
X Y Z
7 NA 5
2 4 5
", header=TRUE)
dt1$var = apply(dt1,1, function(x) max(x,na.rm = TRUE))
Output |
19. Count issue of zeros inwards a row
dt2 = read.table(text="
A B C
8 0 0
6 0 5
", header=TRUE)
apply(dt2,1, function(x) sum(x==0))
20. Does the next code work?
ifelse(df$var1==NA, 0,1)It does non work. The logic functioning on NA returns NA. It does non TRUE or FALSE.
This code works ifelse(is.na(df$var1), 0,1)
21. What would move the terminal value of x after running the next program?
x = 3
mult <- function(j)
{
x = j * 2
return(x)
}
mult(2)
[1] 4
Answer : The value of 'x' volition stay 3. See the output shown inwards the icon below-
Output |
x = 3
mult <- function(j)
{
x <<- j * 2
return(x)
}
mult(2)
x
The operator "<<-" tells R to search inwards the raise surroundings for an existing Definition of the variable nosotros desire to move assigned.
22. How to convert a factor variable to numeric
The as.numeric() role returns a vector of the levels of your factor too non the master copy values. Hence, it is required to convert a factor variable to graphic symbol earlier converting it to numeric.
a <- factor(c(5, 6, 7, 7, 5))
a1 = as.numeric(as.character(a))
23. How to concatenate 2 strings?
The paste() function is used to bring together 2 strings. H5N1 unmarried infinite is the default separator betwixt 2 strings.
a = "Deepanshu"It returns "Deepanshu Bhalla"
b = "Bhalla"
paste(a, b)
If yous desire to modify the default unmarried infinite separator, yous tin add together sep="," keyword to include comma equally a separator.
paste(a, b, sep=",") returns "Deepanshu,Bhalla"
24. How to extract get-go 3 characters from a word
The substr() role is used to extract strings inwards a graphic symbol vector. The syntax of substr role is substr(character_vector, starting_position, end_position)
x = "AXZ2016"Character Functions Explained
substr(x,1,3)
25. How to extract in conclusion mention from total name
The in conclusion mention is the terminate string of the name. For example, Jhonson is the in conclusion mention of "Dave,Jon,Jhonson".
dt2 = read.table(text="The word() role of stringr packet is used to extract or scan give-and-take from a string. -1 inwards the 2nd parameter denotes the in conclusion word.
var
Sandy,Jones
Dave,Jon,Jhonson
", header=TRUE)
library(stringr)
dt2$var2 = word(dt2$var, -1, sep = ",")
26. How to take away leading too trailing spaces
The trimws() function is used to take away leading too trailing spaces.
a = " David Banes "It returns "David Banes".
trimws(a)
27. How to generate random numbers betwixt 1 too 100
The runif() role is used to generate random numbers.
rand = runif(100, min = 1, max = 100)
28. How to apply LEFT JOIN inwards R?
LEFT JOIN implies keeping all rows from the left tabular array (data frame) alongside the matches rows from the correct table. In the merge() function, all.x=TRUE denotes left join.
df1=data.frame(ID=c(1:5), Score=runif(5,50,100))Left Join (SQL Style)
df2=data.frame(ID=c(3,5,7:9), Score2=runif(5,1,100))
comb = merge(df1, df2, yesteryear ="ID", all.x = TRUE)
library(sqldf)
comb = sqldf('select df1.*, df2.* from df1 left bring together df2 on df1.ID = df2.ID')
Left Join alongside dply package
library(dplyr)
comb = left_join(df1, df2, yesteryear = "ID")
29. How to calculate cartesian production of 2 datasets
The cartesian production implies cross production of 2 tables (data frames). For example, df1 has 5 rows too df2 has 5 rows. The combined tabular array would incorporate 25 rows (5*5)
comb = merge(df1,df2,by=NULL)
CROSS JOIN (SQL Style)
library(sqldf)
comb2 = sqldf('select * from df1 bring together df2 ')
30. Unique rows mutual to both the datasets
First, create 2 sample information frames
df1=data.frame(ID=c(1:5), Score=c(50:54))
df2=data.frame(ID=c(3,5,7:9), Score=c(52,60:63))
library(dplyr)
comb = intersect(df1,df2)
library(sqldf)
comb2 = sqldf('select * from df1 intersect choose * from df2 ')
Output : Intersection alongside R |
31. How to mensurate execution fourth dimension of a programme inwards R?
There are multiple ways to mensurate running fourth dimension of code. Some oftentimes used methods are listed below -
R Base Method
start.time <- Sys.time()With tictoc package
runif(5555,1,1000)
end.time <- Sys.time()
end.time - start.time
library(tictoc)
tic()
runif(5555,1,1000)
toc()
32. Which packet is to a greater extent than often than non used for fast information manipulation on large datasets?
The package data.table performs fast information manipulation on large datasets. See the comparing betwixt dplyr too data.table.
# Load data
library(nycflights13)
data(flights)
df = setDT(flights)# Load required packages
library(tictoc)
library(dplyr)
library(data.table)
# Using data.table package
tic()
df[arr_delay > thirty & dest == "IAH",
.(avg = mean(arr_delay),
size = .N),
by = carrier]
toc()
# Using dplyr package
tic()
flights %>% filter(arr_delay > thirty & dest == "IAH") %>%
group_by(carrier) %>% summarise(avg = mean(arr_delay), size = n())
toc()
Result : data.table packet took 0.04 seconds. whereas dplyr packet took 0.07 seconds. So, data.table is approx. 40% faster than dplyr. Since the dataset used inwards the illustration is of medium size, at that spot is no noticeable deviation betwixt the two. As size of information grows, the deviation of execution fourth dimension gets bigger.
33. How to read large CSV file inwards R?
We tin use fread() function of data.table package.
library(data.table)
yyy = fread("C:\\Users\\Dave\\Example.csv", header = TRUE)
We tin also use read.big.matrix() function of bigmemory package.
34. What is the deviation betwixt the next 2 programs ?
1. temp = data.frame(v1<-c(1:10),v2<-c(5:14))
2. temp = data.frame(v1=c(1:10),v2=c(5:14))
In the get-go case, it created 2 vectors v1 too v2 too a information frame temp which has 2 variables alongside improper variable names. The 2nd code creates a information frame temp alongside proper variable names.
35. How to take away all the objects
rm(list=ls())
36. What are the diverse sorting algorithms inwards R?
Major 5 sorting algorithms :
- Bubble Sort
- Selection Sort
- Merge Sort
- Quick Sort
- Bucket Sort
37. Sort information yesteryear multiple variables
Create a sample information frame
mydata = data.frame(score = ifelse(sign(rnorm(25))==-1,1,2),Task : You demand to form score variable on ascending lodge too and thence form sense variable on descending order.
experience= sample(1:25))
R Base Method
mydata1 <- mydata[order(mydata$score, -mydata$experience),]
With dplyr package
library(dplyr)
mydata1 = arrange(mydata, score, desc(experience))
38. Drop Multiple Variables
Suppose yous demand to take away 3 variables - x, y too z from information frame "mydata".
R Base Method
df = subset(mydata, choose = -c(x,y,z))With dplyr package
library(dplyr)
df = select(mydata, -c(x,y,z))
40. How to salve everything inwards R session
save.image(file="dt.RData")
41. How R handles missing values?
Missing values are represented yesteryear working capital missive of the alphabet NA.
To create a novel information without whatever missing value, yous tin exercise the code below :
df <- na.omit(mydata)
42. How to take away duplicate values yesteryear a column
Suppose yous bring a information consisting of 25 records. You are asked to take away duplicates based on a column. In the example, nosotros are eliminating duplicates yesteryear variable y.
information = data.frame(y=sample(1:25, supercede = TRUE), x=rnorm(25))
R Base Method
testify = subset(data, !duplicated(data[,"y"]))dplyr Method
library(dplyr)
test1 = distinct(data, y, .keep_all= TRUE)
43. Which packages are used for transposing information alongside R
The reshape2 too tidyr packages are most pop packages for reshaping information inwards R.
Explanation : Transpose Data
44. Calculate issue of hours, days, weeks, months too years betwixt 2 dates
Let's laid 2 dates :
dates <- as.Date(c("2015-09-02", "2016-09-05"))
difftime(dates[2], dates[1], units = "hours")With lubridate package
difftime(dates[2], dates[1], units = "days")
floor(difftime(dates[2], dates[1], units = "weeks"))
floor(difftime(dates[2], dates[1], units = "days")/365)
library(lubridate)The issue of months unit of measurement is non included inwards the base of operations difftime() role thence nosotros tin exercise interval() role of lubridate() package.
interval(dates[1], dates[2]) %/% hours(1)
interval(dates[1], dates[2]) %/% days(1)
interval(dates[1], dates[2]) %/% weeks(1)
interval(dates[1], dates[2]) %/% months(1)
interval(dates[1], dates[2]) %/% years(1)
45. How to add together 3 months to a date
mydate <- as.Date("2015-09-02")
mydate + months(3)
46. Extract appointment too fourth dimension from timestamp
mydate <- as.POSIXlt("2015-09-27 12:02:14")Extracting diverse fourth dimension periods
library(lubridate)
date(mydate) # Extracting appointment part
format(mydate, format="%H:%M:%S") # Extracting fourth dimension part
day(mydate)
month(mydate)
year(mydate)
hour(mydate)
minute(mydate)
second(mydate)
47. What are diverse ways to write loop inwards R
There are primarily 3 ways to write loop inwards R
- For Loop
- While Loop
- Apply Family of Functions such equally Apply, Lapply, Sapply etc
48. Difference betwixt lapply too sapply inwards R
lapply returns a listing when nosotros apply a role to each chemical ingredient of a information structure. whereas sapply returns a vector.
49. Difference betwixt sort(), rank() too order() functions?
The sort() role is used to form a 1 dimension vector or a unmarried variable of data.
The rank() role returns the ranking of each value.
The order() role returns the indices that tin move used to form the data.
Example :
set.seed(1234)
x = sample(1:50, 10)
x
[1] 6 31 thirty 48 xl 29 1 10 28 22
sort(x)
[1] 1 6 10 22 28 29 thirty 31 xl 48
It sorts the information on ascending order.
rank(x)
[1] 2 8 7 10 9 6 1 3 5 4
2 implies the issue inwards the get-go seat is the 2nd lowest too 8 implies the issue inwards the 2nd seat is the 8th lowest.
order(x)
[1] 7 1 8 10 9 6 3 2 5 4
seven implies the seventh value of x is the smallest value, thence seven is the get-go chemical ingredient of order(x) too i refers to the get-go value of x is the 2nd smallest.
If yous run x[order(x)], it would give yous the same outcome equally sort() function. The deviation betwixt these 2 functions lies inwards 2 or to a greater extent than dimensions of information (two or to a greater extent than columns). In other words, the sort() role cannot move used for to a greater extent than than 1 dimension whereas x[order(x)] tin move used.
50. Extracting Numeric Variables
cols <- sapply(mydata, is.numeric)
abc = mydata [,cols]
Data Science alongside R Interview Questions
The listing below contains most oftentimes asked interview questions for a role of information scientist. Most of the roles related to information scientific discipline or predictive modeling require candidate to move good conversant alongside R too know how to develop too validate predictive models alongside R.
51. Which role is used for edifice linear regression model?
The lm() role is used for plumbing equipment a linear regression model.
52. How to add together interaction inwards the linear regression model?
:An interaction tin move created using colon sign (:). For example, x1 too x2 are 2 predictors (independent variables). The interaction betwixt the variables tin move formed like x1:x2.
See the illustration below -
linreg1 <- lm(y x1 + x2 + x1:x2, data=mydata)
The higher upward code is equivalent to the next code :
53. How to cheque autocorrelation supposition for linear regression?
durbinWatsonTest() function
54. Which role is useful for developing a binary logistic regression model?
glm() role with family = "binomial"
55. How to perform stepwise variable choice inwards logistic regression model?
Run step() role afterward edifice logistic model alongside glm() function.
56. How to make scoring inwards the logistic regression model?
Run predict(logit_model, validation_data, type = "response")
57. How to split upward information into preparation too validation?
58. How to standardize variables?
data2 = scale(data)
59. How to validate cluster analysis
Validate Cluster Analysis
60. Which are the pop R packages for determination tree?
rpart, party
61. What is the deviation betwixt rpart too political party packet for developing a determination tree model?
rpart is based on Gini Index which measures impurity inwards node. Whereas ctree() role from "party" packet uses a significance testify physical care for inwards lodge to choose variables.
62. How to cheque correlation alongside R?
cor() function
63. Have yous heard 'relaimpo' package?
It is used to mensurate the relative importance of independent variables inwards a model.
64. How to fine melody random woods model?
Use tuneRF() function
65. What shrinkage defines inwards slope boosting model?
Shrinkage is used for reducing, or shrinking, the comport on of each additional fitted base-learner (tree).
66. How to brand information stationary for ARIMA fourth dimension serial model?
Use ndiffs() role which returns the issue of deviation required to brand information stationary.
67. How to automate arima model?
Use auto.arima() role of forecast package
68. How to fit proportional hazards model inwards R?
Use coxph() role of survival package.
69. Which packet is used for marketplace seat handbasket analysis?
arules package
70. Parallelizing Machine Learning Algorithms
Link : Parallelizing Machine Learning
linreg1 <- lm(y x1*x2, data=mydata)x1:x2 - It implies including both principal effects (x1 + x2) and interaction (x1:x2).
53. How to cheque autocorrelation supposition for linear regression?
durbinWatsonTest() function
54. Which role is useful for developing a binary logistic regression model?
glm() role with family = "binomial"
55. How to perform stepwise variable choice inwards logistic regression model?
Run step() role afterward edifice logistic model alongside glm() function.
56. How to make scoring inwards the logistic regression model?
Run predict(logit_model, validation_data, type = "response")
57. How to split upward information into preparation too validation?
dt = sort(sample(nrow(mydata), nrow(mydata)*.7))
train<-mydata[dt,]
val<-mydata[-dt,]
58. How to standardize variables?
data2 = scale(data)
59. How to validate cluster analysis
Validate Cluster Analysis
60. Which are the pop R packages for determination tree?
rpart, party
61. What is the deviation betwixt rpart too political party packet for developing a determination tree model?
rpart is based on Gini Index which measures impurity inwards node. Whereas ctree() role from "party" packet uses a significance testify physical care for inwards lodge to choose variables.
62. How to cheque correlation alongside R?
cor() function
63. Have yous heard 'relaimpo' package?
It is used to mensurate the relative importance of independent variables inwards a model.
64. How to fine melody random woods model?
Use tuneRF() function
65. What shrinkage defines inwards slope boosting model?
Shrinkage is used for reducing, or shrinking, the comport on of each additional fitted base-learner (tree).
66. How to brand information stationary for ARIMA fourth dimension serial model?
Use ndiffs() role which returns the issue of deviation required to brand information stationary.
67. How to automate arima model?
Use auto.arima() role of forecast package
68. How to fit proportional hazards model inwards R?
Use coxph() role of survival package.
69. Which packet is used for marketplace seat handbasket analysis?
arules package
70. Parallelizing Machine Learning Algorithms
Link : Parallelizing Machine Learning