randomforest.Rd
Random Forest. Trains a random forest on the training dataset and uses it to predict the classification of the test dataset. The resulting accuracy, sensitivity and specificity are returned, as well as a summary of the importance of features in the dataset.
randomforest(data_train, data_test, numoftrees = 10, includeplot = FALSE)
data_train | Training set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model is trained. |
---|---|
data_test | Test set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model in tested. |
numoftrees | Number of trees used in the random forest (default:10) |
includeplot | Show performance scatter plot (default:FALSE) |
List containing performance percentages, accessed using training (training accuracy), test (test accuracy), trainsensitivity, testsensitivity, trainspecificity, testspecificity. Also accessed using importance is the vector of Mean Decrease in Gini Index. This can be used to find the features which contribute most to classification.
data_train = data.frame( classification=as.factor(c(1,1,0,0,1,1,0,0,1,1)), A=c(1,1,1,0,0,0,1,1,1,0), B=c(0,1,1,0,1,1,0,1,1,0), C=c(0,0,1,0,0,1,0,0,1,0)) data_test = data.frame( classification=as.factor(c(1,1,0,0,1,1,1,0)), A=c(0,0,0,1,0,0,0,1), B=c(1,1,1,0,0,1,1,1), C=c(0,0,1,1,0,0,1,1)) randomforest(data_train,data_test,numoftrees=5)#> $training #> [1] 0.6 #> #> $test #> [1] 0.625 #> #> $testsensitivity #> [1] 0 #> #> $testspecificity #> [1] 0 #> #> $trainsensitivity #> [1] 0.8333333 #> #> $trainspecificity #> [1] 0.25 #> #> $importance #> 0 1 MeanDecreaseAccuracy MeanDecreaseGini #> A -1.118034 -2.637522 -2.360294 0.5942857 #> B -1.118034 -1.767767 -1.795462 0.7400000 #> C -1.767767 -1.118034 -1.574645 0.4857143 #>