Embryonic Genetic Algorithm. Feature selection based on Embryonic Genetic Algorithms. It performs feature selection by maintaining an ongoing set of 'good' set of features which are improved run by run. It outputs training and test accuracy, sensitivity and specificity and a list of <=k features.

eGA(
  k = 30,
  data_train,
  data_test,
  mutprob = 0.05,
  includePlot = FALSE,
  maxnumruns = 50
)

Arguments

k

Maximum number of features in the output feature set (default:30)

data_train

Training set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model is trained.

data_test

Test set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model in tested.

mutprob

Probability that mutation will be performed for each produced feature set from forward feature selection (default:0.05)

includePlot

Show performance scatter plot (default:FALSE)

maxnumruns

Maximum number of iterations after which the feature set will be output, if no other termination conditions have been met (default:50)

Value

List containing (ordered list of) selected features, performance percentages, accessed using training (training accuracy), test (test accuracy), trainsensitivity, testsensitivity, trainspecificity, testspecificity. Also accessed using listofongoing is a list containing the length of the ongoing set at each stage.

Examples

data_train = data.frame( classification=as.factor(c(1,1,0,0,1,1,0,0,1,1)), A=c(1,1,1,0,0,0,1,1,1,0), B=c(0,1,1,0,1,1,0,1,1,0), C=c(0,0,1,0,0,1,0,0,1,0), D=c(0,1,1,0,0,0,1,0,0,0), E=c(1,0,1,0,0,1,0,1,1,0)) data_test = data.frame( classification=as.factor(c(1,1,0,0,1,1,1,0)), A=c(0,0,0,1,0,0,0,1), B=c(1,1,1,0,0,1,1,1), C=c(0,0,1,1,0,0,1,1), D=c(0,0,1,1,0,1,0,1), E=c(0,0,1,0,1,0,1,1)) data = read.csv(paste(system.file('samples/subsamples', package = "feamiR"),'/sample0.csv',sep='')) data = rbind(head(data,50),tail(data,50)) data$classification = as.factor(data$classification) ind <- sample(2,nrow(data),replace=TRUE,prob=c(0.8,0.2)) data_train <- data[ind==1,] data_test <- data[ind==2,] eGA(k=7,data_train,data_test,maxnumruns=3)
#> $feature_list #> [1] "position1_U" "position5_G" "pair2_UG" "pair6_AU" #> #> $training #> [1] 0.7926829 #> #> $test #> [1] 0.7222222 #> #> $trainspecificity #> [1] 0.6744186 #> #> $testspecificity #> [1] 0.8571429 #> #> $trainsensitivity #> [1] 0.9230769 #> #> $testsensitivity #> [1] 0.6363636 #> #> $listofongoing #> $listofongoing[[1]] #> [1] 4 #> #> $listofongoing[[2]] #> [1] 4 #> #> $listofongoing[[3]] #> [1] 4 #> #>