Branches of mechanical engineering: Density-Based Clustering Exercises + Solutions

Density-based clustering is a technique that allows to division information into groups alongside like characteristics (clusters) simply does non ask specifying the break of those groups inwards advance. In density-based clustering, clusters are defined every bit dense regions of information points separated yesteryear low-density regions. Density is measured yesteryear the break of information points inside some radius.
Advantages of density-based clustering:

as mentioned above, it does non ask a predefined break of clusters,
clusters tin survive of whatever shape, including non-spherical ones,
the technique is able to seat dissonance information (outliers).

Disadvantages:

density-based clustering fails if at that topographic point are no density drops betwixt clusters,
it is besides sensitive to parameters that define density (radius in addition to the minimum break of points); proper parameter setting may ask domain knowledge.

There are dissimilar methods of density-based clustering. The most pop are DBSCAN (density-based spatial clustering of applications alongside noise), which assumes constant density of clusters, OPTICS (ordering points to seat the clustering structure), which allows for varying density, in addition to “mean-shift”.
This laid of exercises covers basic techniques for using the DBSCAN method, in addition to allows to compare its outcome to the results of the k-means clustering algorithm yesteryear agency of the silhouette analysis.
The laid requires the packages dbscan, cluster, and factoextra to survive installed. The exercises brand utilization of the iris data set, which is supplied alongside R, in addition to the wholesale customers data laid from the University of California, Irvine (UCI) machine learning repository (download here).
Answers to the exercises are available here.

Exercise 1
Create a novel information frame using all simply the end variable from the iris data set, which is supplied alongside R.

Exercise 2
Use the scale function to normalize values of all variables inwards the novel information laid (with default settings). Ensure that the resulting object is of class data.frame.

Exercise 3
Plot the distribution of distances betwixt information points in addition to their 5th nearest neighbors using the kNNdistplot function from the dbscan package.
Examine the plot in addition to uncovering a tentative threshold at which distances start increasing quickly. On the same plot, pull a horizontal trouble at the score of the threshold.

Exercise 4
Use the dbscan function from the packet of the same shout out to uncovering density-based clusters inwards the data. Set the size of the epsilon neighborhood at the score of the constitute threshold, in addition to laid the break of minimum points inwards the epsilon percentage equal to 5.
Assign the value returned yesteryear the component subdivision to an object, in addition to impress that object.

Exercise 5
Plot the clusters alongside the fviz_cluster function from the factoextra package. Choose the geometry type to pull solely points on the graph, in addition to assign the ellipse parameter value such that an outline around points of each cluster is non drawn.
(Note that the fviz_cluster function produces a 2-dimensional plot. If the information laid contains ii variables those variables are used for plotting, if the break of variables is bigger the outset ii principal components are drawn.)

Learn more about Data Pre-Processing inwards the online course R Data Pre-Processing & Data Management – Shape your Data!. In this course of teaching you lot volition larn how to:

Delve into diverse algorithms for classification such every bit KNN in addition to come across how they are applied inwards R
Evaluate k-Means, Connectivity, Distribution, in addition to Density based clustering
And much more

Exercise 6
Examine the construction of the cluster object obtained inwards Exercise 4, in addition to uncovering the vector alongside cluster assignments. Make a re-create of the information set, add together the vector of cluster assignments to the information set, in addition to impress its outset few lines.

Exercise 7
Now await at what happens if you lot alter the epsilon value.

Plot in 1 trial again the distribution of distances betwixt information points in addition to their 5th nearest neighbors (with the kNNdistplot function, every bit inwards Exercise 3). On that plot, pull horizontal lines at levels 1.8, 0.5, in addition to 0.4.
Use the dbscan function to uncovering clusters inwards the information alongside the epsilon laid at these values (as inwards Exercise 4).
Plot the results (as inwards the Exercise 5, simply forthwith laid the ellipse parameter value such that an outline around points is drawn).

Exercise 8
This practice shows how the DBSCAN algorithm tin survive used every bit a way to uncovering outliers:

Load the Wholesale customers data set, in addition to delete all variables alongside the exception of Fresh and Milk. Assign the information laid to the customers variable.
Discover clusters using the steps from Exercises 2-5: scale the data, select an epsilon value, uncovering clusters, in addition to plot them. Set the break of minimum points to 5. Use the db_clusters_customersvariable to shop the output of the dbscan function.

Exercise 9
Compare the results obtained inwards the previous practice alongside the results of the k-means algorithm. First, uncovering clusters using this algorithm:

Use the same information set, simply acquire rid of outliers for both variables (here the outliers may survive defined every bit values beyond 2.5 measure deviations from the mean; banknote that the values are already expressed inwards unit of measurement of measure divergence nigh the mean). Assign the novel information laid to the customers_core variable.
Use kmeans function to obtain an object alongside cluster assignments. Set the break of centers equal to 4, in addition to the break of initial random sets (the nstart parameter) equal to 10. Assign the obtained object to the variable km_clusters_customers variable.
Plot clusters using the fviz_cluster function (as inwards the previous exercise).

Exercise 10
Now compare the results of DBSCAN in addition to k-means using silhouette analysis:

Retrieve a vector of cluster assignments from the db_clusters_customers object.
Calculate distances betwixt information points inwards the customers data laid using the dist function (with the default parameters).
Use the vector in addition to the distances object every bit inputs into the silhouette function from the clusterpackage to acquire a silhouette information object.
Plot that object alongside the fviz_silhouette function from the factoextra package.
Repeat the steps described inwards a higher house for the km_clusters_customers object in addition to the customers_coredata sets.
Compare ii plots in addition to the average silhouette width values.

________________________________________

Below are the solutions to these exercises on density-based clustering.

#################### #                  # #    Exercise 1    # #                  # #################### df <- iris[, -ncol(iris)]  #################### #                  # #    Exercise 2    # #                  # #################### df <- scale(df) df <- as.data.frame(df)  #################### #                  # #    Exercise three    # #                  # #################### require(dbscan) kNNdistplot(df, k = 5) abline(h = 0.8, col = "red")

based clustering is a technique that allows to division information into groups alongside like cha branchesofmechanicalengineering: Density-Based Clustering Exercises + Solutions

#################### #                  # #    Exercise iv    # #                  # #################### require(dbscan) db_clusters_iris <- dbscan(df, eps=0.8, minPts=5) print(db_clusters_iris)

## DBSCAN clustering for 150 objects. ## Parameters: eps = 0.8, minPts = v ## The clustering contains 2 cluster(s) in addition to iv dissonance points. ##  ##  0  1  2  ##  iv 49 97  ##  ## Available fields: cluster, eps, minPts

#################### #                  # #    Exercise v    # #                  # #################### require(factoextra) fviz_cluster(db_clusters_iris, df, ellipse = FALSE, geom = "point")

#################### #                  # #    Exercise half dozen    # #                  # #################### df_copy <- df df_copy[['cluster']] <- db_clusters_iris[['cluster']] print(head(df_copy))

##   Sepal.Length Sepal.Width Petal.Length Petal.Width cluster ## 1   -0.8976739  1.01560199    -1.335752   -1.311052       1 ## 2   -1.1392005 -0.13153881    -1.335752   -1.311052       1 ## three   -1.3807271  0.32731751    -1.392399   -1.311052       1 ## iv   -1.5014904  0.09788935    -1.279104   -1.311052       1 ## v   -1.0184372  1.24503015    -1.335752   -1.311052       1 ## half dozen   -0.5353840  1.93331463    -1.165809   -1.048667       1

#################### #                  # #    Exercise seven    # #                  # #################### require(dbscan) require(factoextra)  # do a vector of epsilon values epsilon_values <- c(1.8, 0.5, 0.4)  # plot the distribution of distances kNNdistplot(df, k = 5)  # plot lines at epsilon values for (e in epsilon_values) {   abline(h = e, col = "red") }

# uncovering clusters for each epsilon value in addition to plot those clusters for (e in epsilon_values) {   db_clusters_iris <- dbscan(df, eps=e, minPts=4)   title <- paste("Plot for epsilon = ", e)   g <- fviz_cluster(db_clusters_iris, df, ellipse = TRUE, geom = "point",                     main = title)   print(g) }

#################### #                  # #    Exercise 8    # #                  # #################### require(dbscan) require(factoextra)  # charge in addition to ready the data customers <- read.csv("Wholesale customers data.csv") customers <- customers[, c("Fresh","Milk")] customers <- scale(customers) customers <- as.data.frame(customers)  # plot the distribution of distances to the 5th nearest neighbors  kNNdistplot(customers, k = 5) abline(h = 0.4, col = "red")

# uncovering clusters db_clusters_customers <- dbscan(customers, eps=0.4, minPts=5) print(db_clusters_customers)

## DBSCAN clustering for 440 objects. ## Parameters: eps = 0.4, minPts = v ## The clustering contains 1 cluster(s) in addition to 22 dissonance points. ##  ##   0   1  ##  22 418  ##  ## Available fields: cluster, eps, minPts

# plot clusters fviz_cluster(db_clusters_customers, customers, ellipse = FALSE, geom = "point")

#################### #                  # #    Exercise ix    # #                  # #################### require(factoextra)  # take away values beyond 2.5 measure deviations  customers_core <- customers[customers[['Fresh']] > -2.5 &                               customers[['Fresh']] < 2.5, ] customers_core <- customers_core[customers_core[['Milk']] > -2.5 &                                    customers_core[['Milk']] < 2.5, ]  # uncovering clusters in addition to plot them km_clusters_customers <- kmeans(customers_core, centers = 4, nstart = 10) fviz_cluster(km_clusters_customers,              customers_core,              ellipse = FALSE,              geom = "point")

#################### #                  # #    Exercise 10   # #                  # #################### require(dbscan) require(cluster) require(factoextra)  ## DBSCAN results  # shout out upwards a vector of cluster assignments db_clusters_vector <- db_clusters_customers[['cluster']]  # calculate distances betwixt information points db_distances <- dist(customers)  # acquire a silhouette information object db_silhouette <- silhouette(db_clusters_vector, db_distances)  # plot the silhouette fviz_silhouette(db_silhouette)

##   cluster size ave.sil.width ## 0       0   22         -0.02 ## 1       1  418          0.72

## k-means results  # shout out upwards a vector of cluster assignments km_clusters_vector <- km_clusters_customers[['cluster']]  # calculate distances betwixt information points km_distances <- dist(customers_core)  # acquire a silhouette information object km_silhouette <- silhouette(km_clusters_vector, km_distances)  # plot the silhouette fviz_silhouette(km_silhouette)

##   cluster size ave.sil.width ## 1       1   47          0.28 ## 2       2  190          0.46 ## three       three   69          0.37 ## iv       iv  113          0.41

Sumber http://engdashboard.blogspot.com/

Bali Attractions

Branches of mechanical engineering: Density-Based Clustering Exercises + Solutions

BACA JUGA LAINNYA: