Embryonic Genetic Algorithm. Feature selection based on Embryonic Genetic Algorithms. It performs feature selection by maintaining an ongoing set of 'good' set of features which are improved run by run. It outputs training and test accuracy, sensitivity and specificity and a list of <=k features.

eGA(
  k = 30,
  data_train,
  data_test,
  mutprob = 0.05,
  includePlot = FALSE,
  maxnumruns = 50
)

Arguments

k	Maximum number of features in the output feature set (default:30)
data_train	Training set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model is trained.
data_test	Test set: dataframe containing classification column and all other columns features. This is the dataset on which the decision tree model in tested.
mutprob	Probability that mutation will be performed for each produced feature set from forward feature selection (default:0.05)
includePlot	Show performance scatter plot (default:FALSE)
maxnumruns	Maximum number of iterations after which the feature set will be output, if no other termination conditions have been met (default:50)

Value

List containing (ordered list of) selected features, performance percentages, accessed using training (training accuracy), test (test accuracy), trainsensitivity, testsensitivity, trainspecificity, testspecificity. Also accessed using listofongoing is a list containing the length of the ongoing set at each stage.

Examples

data_train = data.frame(
      classification=as.factor(c(1,1,0,0,1,1,0,0,1,1)),
      A=c(1,1,1,0,0,0,1,1,1,0),
      B=c(0,1,1,0,1,1,0,1,1,0),
      C=c(0,0,1,0,0,1,0,0,1,0),
      D=c(0,1,1,0,0,0,1,0,0,0),
      E=c(1,0,1,0,0,1,0,1,1,0))
data_test = data.frame(
      classification=as.factor(c(1,1,0,0,1,1,1,0)),
      A=c(0,0,0,1,0,0,0,1),
      B=c(1,1,1,0,0,1,1,1),
      C=c(0,0,1,1,0,0,1,1),
      D=c(0,0,1,1,0,1,0,1),
      E=c(0,0,1,0,1,0,1,1))
data = read.csv(paste(system.file('samples/subsamples', package = "feamiR"),'/sample0.csv',sep=''))
data = rbind(head(data,50),tail(data,50))
data$classification = as.factor(data$classification)
ind <- sample(2,nrow(data),replace=TRUE,prob=c(0.8,0.2))
data_train <- data[ind==1,]
data_test <- data[ind==2,]
eGA(k=7,data_train,data_test,maxnumruns=3)
#> $feature_list
#> [1] "position1_U" "position5_G" "pair2_UG"    "pair6_AU"   
#> 
#> $training
#> [1] 0.7926829
#> 
#> $test
#> [1] 0.7222222
#> 
#> $trainspecificity
#> [1] 0.6744186
#> 
#> $testspecificity
#> [1] 0.8571429
#> 
#> $trainsensitivity
#> [1] 0.9230769
#> 
#> $testsensitivity
#> [1] 0.6363636
#> 
#> $listofongoing
#> $listofongoing[[1]]
#> [1] 4
#> 
#> $listofongoing[[2]]
#> [1] 4
#> 
#> $listofongoing[[3]]
#> [1] 4
#> 
#>