Random Forest. Trains a random forest on the training dataset and uses it to predict the classification of the test dataset. The resulting accuracy, sensitivity and specificity are returned, as well as a summary of the importance of features in the dataset.

randomforest(data_train, data_test, numoftrees = 10, includeplot = FALSE)

Arguments

data_train

Training set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model is trained.

data_test

Test set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model in tested.

numoftrees

Number of trees used in the random forest (default:10)

includeplot

Show performance scatter plot (default:FALSE)

Value

List containing performance percentages, accessed using training (training accuracy), test (test accuracy), trainsensitivity, testsensitivity, trainspecificity, testspecificity. Also accessed using importance is the vector of Mean Decrease in Gini Index. This can be used to find the features which contribute most to classification.

Examples

data_train = data.frame( classification=as.factor(c(1,1,0,0,1,1,0,0,1,1)), A=c(1,1,1,0,0,0,1,1,1,0), B=c(0,1,1,0,1,1,0,1,1,0), C=c(0,0,1,0,0,1,0,0,1,0)) data_test = data.frame( classification=as.factor(c(1,1,0,0,1,1,1,0)), A=c(0,0,0,1,0,0,0,1), B=c(1,1,1,0,0,1,1,1), C=c(0,0,1,1,0,0,1,1)) randomforest(data_train,data_test,numoftrees=5)
#> $training #> [1] 0.6 #> #> $test #> [1] 0.625 #> #> $testsensitivity #> [1] 0 #> #> $testspecificity #> [1] 0 #> #> $trainsensitivity #> [1] 0.8333333 #> #> $trainspecificity #> [1] 0.25 #> #> $importance #> 0 1 MeanDecreaseAccuracy MeanDecreaseGini #> A -1.118034 -2.637522 -2.360294 0.5942857 #> B -1.118034 -1.767767 -1.795462 0.7400000 #> C -1.767767 -1.118034 -1.574645 0.4857143 #>