swag
packageswag is a package that trains a meta-learning procedure which combines screening and wrapper methods to find a set of extremely low-dimensional attribute combinations.
First install the devtools package. Then swag with the following code:
## if not installed
## install.packages("remotes")
::install_github("SMAC-Group/SWAG-R-Package")
remotes
library(swag) #load the new package
We propose to use the breastcancer dataset readily available from the package mlbench to give an overview of swag.
# After having installed the mlbench package
data(BreastCancer, package = "mlbench")
# Pre-processing of the data
<- BreastCancer$Class # response variable
y <- as.matrix(BreastCancer[setdiff(names(BreastCancer),c("Id","Class"))]) # features
x
# remove missing values and change to 'numeric'
<- which(apply(x,1,function(x) sum(is.na(x)))>0)
id <- y[-id]
y <- x[-id,]
x <- apply(x,2,as.numeric)
x
# Training and test set
set.seed(180) # for replication
<- sample(1:dim(x)[1],dim(x)[1]*0.2)
ind <- y[ind]
y_test <- y[-ind]
y_train <- x[ind,]
x_test <-x[-ind,] x_train
Now we are ready to train with swag! The first step
is to define the meta-parameters of the swag procedure:
(p_{max}) the maximum dimension of attributes, () a performance quantile
which represents the percentage of learners which are selected at each
dimension and (m), the maximum numbers of learners trained at each
dimension. We can set all these meta-parameters, together with a seed
for replicability purposes and verbose = TRUE
to get a
message as each dimension is completed, thanks to the
swagcontrol() function which behaves similarly to the
trControl =
argument of caret.
# Meta-parameters chosen for the breast cancer dataset
<- swagControl(pmax = 4L,
swagcon alpha = 0.5,
m = 20L,
seed = 163L, #for replicability
verbose = T #keeps track of completed dimensions
)
# Given the low dimensional dataset, we can afford a wider search
# by fixing alpha = 0.5 as a smaller alpha may also stop the
# training procedure earlier than expected.
Having set-up the meta-parameters as explained above, we are now ready to train the swag. We start with the linear Support Vector Machine learner:
### SVM Linear Learner ###
<- swag(
train_swag_svml # arguments for swag
x = x_train,
y = y_train,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "svmLinear", # Use method = "svmRadial" to train this alternative learner
preProcess = c("center", "scale")
)
## [1] "Dimension explored: 1 - CV errors at alpha: 0.115"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0549"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0403"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0394"
The only difference with respect to the classic caret train function, is the specification of the swag arguments which have been explained previously. In the above chunk for the svmLinear learner, we define the estimator of the out-of-sample accuracy as 10-fold cross-validation repeated 1 time. For this specific case, we have chosen to center and rescale the data, as usually done for SVMs, and, the parameter that controls the margin in SVMs is automatically fixed at unitary value (i.e. (c=1)).
Let’s have a look at the typical output of a swag training object for the svmLinear learner:
$CVs train_swag_svml
## [[1]]
## [1] 0.14094276 0.06959836 0.07499399 0.15157407 0.10811688 0.08592593 0.11502886
## [8] 0.12070707 0.22122896
##
## [[2]]
## [1] 0.05107744 0.06225950 0.03852213 0.05492304 0.06030544 0.04377104
## [7] 0.05108225 0.06212121 0.07485570 0.05491582
##
## [[3]]
## [1] 0.04010101 0.04761063 0.03848846 0.04030784 0.04575758 0.04016835 0.03841991
## [8] 0.04387205 0.05105099
##
## [[4]]
## [1] 0.03464646 0.04572751 0.04030664 0.03852213
# A list which contains the cv training errors of each learner explored in a given dimension
$VarMat train_swag_svml
## [[1]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 2 2 2 3 3 3 5 5 6
## [2,] 3 5 6 7 5 6 7 6 7 7
##
## [[3]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 2 2 2 3 2 2 3 3 5
## [2,] 3 3 6 6 3 5 5 5 6
## [3,] 6 7 7 7 5 6 6 7 7
##
## [[4]]
## [,1] [,2] [,3] [,4]
## [1,] 2 2 2 3
## [2,] 3 3 5 5
## [3,] 6 5 6 6
## [4,] 7 6 7 7
# A list which contrains a matrix, for each dimension, with the attributes tested at that step
$cv_alpha train_swag_svml
## [1] 0.11502886 0.05491943 0.04030784 0.03941438
# The cut-off cv training error, at each dimension, determined by the choice of alpha
The other two learners that we have implemented on swag are: lasso (glmnet package required) and random forest (party package required). The training phase for these learners, differs a little with respect to the SVM one. We can look at the random forest for a practical example:
### Random Forest Learner ###
<- swag(
train_swag_rf # arguments for swag
x = x,
y = y,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "rf",
# dynamically modify arguments for caret
caret_args_dyn = function(list_arg,iter){
$tuneGrid = expand.grid(.mtry=sqrt(iter))
list_arg
list_arg
} )
## [1] "Dimension explored: 1 - CV errors at alpha: 0.0996"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0534"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0461"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0425"
The newly introduced argument caret_args_dyn
enables the
user to modify the hyper-parameters related to a given learner in a
dynamic way since they can change as the dimension grows up to the
desired (p_{max}). This allows to adapt the mtry
hyper-parameter as the dimension grows. In the example above, we have
fixed mtry to the square root of the number of attributes at
each step as it is usually done in practice.
You can tailor the learning arguments of swag() as you like, introducing for example grids for the hyper-parameters specific of a given learner or update these grids as the dimension increases similarly to what is usually done for the caret package. This gives you a wide range of possibilities and a lot of flexibility in the training phase.
To conclude this brief introduction, we present the usual predict() function which can be applied to a swag trained object similarly to many other packages in R. We pick the random forest learner for this purpose.
# best learner predictions
# if `newdata` is not specified, then predict gives predictions based on the training
# sample
sapply(predict(object = train_swag_rf), function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 2
##
## $models
## $models[[1]]
## [1] 3 5 6 7
# best learner predictions
<- predict(object = train_swag_rf,
best_pred newdata = x_test)
sapply(best_pred, function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
# predictions for a given dimension
<- predict(
dim_pred object = train_swag_rf,
newdata = x_test,
type = "attribute",
attribute = 4L)
sapply(dim_pred,function(x) head(x))
## $predictions
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 1 1 1 1
## [3,] 1 1 1 1
## [4,] 2 2 2 2
## [5,] 1 1 1 1
## [6,] 1 1 1 1
##
## $models
## $models[[1]]
## [1] 2 3 5 6
##
## $models[[2]]
## [1] 2 3 5 7
##
## $models[[3]]
## [1] 3 5 6 7
##
## $models[[4]]
## [1] 2 3 6 7
# predictions below a given CV error
<- predict(
cv_pred object = train_swag_rf,
newdata = x_test,
type = "cv_performance",
cv_performance = 0.04)
sapply(cv_pred,function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
Now we can evaluate the performance of the best learner selected by swag thanks to the confusionMatrix() function of caret.
# transform predictions into a data.frame of factors with levels of `y_test`
<- factor(levels(y_test)[best_pred$predictions])
best_learn ::confusionMatrix(best_learn,y_test) caret
## Confusion Matrix and Statistics
##
## Reference
## Prediction benign malignant
## benign 90 0
## malignant 0 46
##
## Accuracy : 1
## 95% CI : (0.9732, 1)
## No Information Rate : 0.6618
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.6618
## Detection Rate : 0.6618
## Detection Prevalence : 0.6618
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : benign
##
Thanks for the attention. You can definitely say that you worked with swag !!!