eGA.Rd
Embryonic Genetic Algorithm. Feature selection based on Embryonic Genetic Algorithms. It performs feature selection by maintaining an ongoing set of 'good' set of features which are improved run by run. It outputs training and test accuracy, sensitivity and specificity and a list of <=k features.
eGA( k = 30, data_train, data_test, mutprob = 0.05, includePlot = FALSE, maxnumruns = 50 )
k | Maximum number of features in the output feature set (default:30) |
---|---|
data_train | Training set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model is trained. |
data_test | Test set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model in tested. |
mutprob | Probability that mutation will be performed for each produced feature set from forward feature selection (default:0.05) |
includePlot | Show performance scatter plot (default:FALSE) |
maxnumruns | Maximum number of iterations after which the feature set will be output, if no other termination conditions have been met (default:50) |
List containing (ordered list of) selected features, performance percentages, accessed using training (training accuracy), test (test accuracy), trainsensitivity, testsensitivity, trainspecificity, testspecificity. Also accessed using listofongoing is a list containing the length of the ongoing set at each stage.
data_train = data.frame( classification=as.factor(c(1,1,0,0,1,1,0,0,1,1)), A=c(1,1,1,0,0,0,1,1,1,0), B=c(0,1,1,0,1,1,0,1,1,0), C=c(0,0,1,0,0,1,0,0,1,0), D=c(0,1,1,0,0,0,1,0,0,0), E=c(1,0,1,0,0,1,0,1,1,0)) data_test = data.frame( classification=as.factor(c(1,1,0,0,1,1,1,0)), A=c(0,0,0,1,0,0,0,1), B=c(1,1,1,0,0,1,1,1), C=c(0,0,1,1,0,0,1,1), D=c(0,0,1,1,0,1,0,1), E=c(0,0,1,0,1,0,1,1)) data = read.csv(paste(system.file('samples/subsamples', package = "feamiR"),'/sample0.csv',sep='')) data = rbind(head(data,50),tail(data,50)) data$classification = as.factor(data$classification) ind <- sample(2,nrow(data),replace=TRUE,prob=c(0.8,0.2)) data_train <- data[ind==1,] data_test <- data[ind==2,] eGA(k=7,data_train,data_test,maxnumruns=3)#> $feature_list #> [1] "position1_U" "position5_G" "pair2_UG" "pair6_AU" #> #> $training #> [1] 0.7926829 #> #> $test #> [1] 0.7222222 #> #> $trainspecificity #> [1] 0.6744186 #> #> $testspecificity #> [1] 0.8571429 #> #> $trainsensitivity #> [1] 0.9230769 #> #> $testsensitivity #> [1] 0.6363636 #> #> $listofongoing #> $listofongoing[[1]] #> [1] 4 #> #> $listofongoing[[2]] #> [1] 4 #> #> $listofongoing[[3]] #> [1] 4 #> #>