`swag` package

swag is a package that trains a meta-learning procedure which combines screening and wrapper methods to find a set of extremely low-dimensional attribute combinations.

Installing the package with GitHub

First install the devtools package. Then swag with the following code:

## if not installed
## install.packages("remotes")

remotes::install_github("SMAC-Group/SWAG-R-Package")

library(swag) #load the new package

Quick start

We propose to use the breastcancer dataset readily available from the package mlbench to give an overview of swag.

# After having installed the mlbench package

data(BreastCancer, package = "mlbench")

# Pre-processing of the data
y <- BreastCancer$Class # response variable
x <- as.matrix(BreastCancer[setdiff(names(BreastCancer),c("Id","Class"))]) # features

# remove missing values and change to 'numeric'
id <- which(apply(x,1,function(x) sum(is.na(x)))>0)
y <- y[-id]
x <- x[-id,]
x <- apply(x,2,as.numeric)

# Training and test set
set.seed(180) # for replication
ind <- sample(1:dim(x)[1],dim(x)[1]*0.2)  
y_test <- y[ind]
y_train <- y[-ind]
x_test <- x[ind,]
x_train <-x[-ind,]

Now we are ready to train with swag! The first step is to define the meta-parameters of the swag procedure: (p_{max}) the maximum dimension of attributes, () a performance quantile which represents the percentage of learners which are selected at each dimension and (m), the maximum numbers of learners trained at each dimension. We can set all these meta-parameters, together with a seed for replicability purposes and verbose = TRUE to get a message as each dimension is completed, thanks to the swagcontrol() function which behaves similarly to the trControl = argument of caret.

# Meta-parameters chosen for the breast cancer dataset
swagcon <- swagControl(pmax = 4L, 
                       alpha = 0.5, 
                       m = 20L,
                       seed = 163L, #for replicability
                       verbose = T #keeps track of completed dimensions
                       )

# Given the low dimensional dataset, we can afford a wider search 
# by fixing alpha = 0.5 as a smaller alpha may also stop the 
# training procedure earlier than expected.

Having set-up the meta-parameters as explained above, we are now ready to train the swag. We start with the linear Support Vector Machine learner:

### SVM Linear Learner ###
train_swag_svml <- swag(
  # arguments for swag
  x = x_train, 
  y = y_train, 
  control = swagcon,
  auto_control = FALSE,
  # arguments for caret
  trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
  metric = "Accuracy",
  method = "svmLinear",  # Use method = "svmRadial" to train this alternative learner
  preProcess = c("center", "scale")
)

## [1] "Dimension explored: 1 - CV errors at alpha: 0.115"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0549"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0403"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0394"

The only difference with respect to the classic caret train function, is the specification of the swag arguments which have been explained previously. In the above chunk for the svmLinear learner, we define the estimator of the out-of-sample accuracy as 10-fold cross-validation repeated 1 time. For this specific case, we have chosen to center and rescale the data, as usually done for SVMs, and, the parameter that controls the margin in SVMs is automatically fixed at unitary value (i.e. (c=1)).

Let’s have a look at the typical output of a swag training object for the svmLinear learner:

train_swag_svml$CVs

## [[1]]
## [1] 0.14094276 0.06959836 0.07499399 0.15157407 0.10811688 0.08592593 0.11502886
## [8] 0.12070707 0.22122896
## 
## [[2]]
##  [1] 0.05107744 0.06225950 0.03852213 0.05492304 0.06030544 0.04377104
##  [7] 0.05108225 0.06212121 0.07485570 0.05491582
## 
## [[3]]
## [1] 0.04010101 0.04761063 0.03848846 0.04030784 0.04575758 0.04016835 0.03841991
## [8] 0.04387205 0.05105099
## 
## [[4]]
## [1] 0.03464646 0.04572751 0.04030664 0.03852213

# A list which contains the cv training errors of each learner explored in a given dimension

train_swag_svml$VarMat

## [[1]]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    1    2    3    4    5    6    7    8    9
## 
## [[2]]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    2    2    2    2    3    3    3    5    5     6
## [2,]    3    5    6    7    5    6    7    6    7     7
## 
## [[3]]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    2    2    2    3    2    2    3    3    5
## [2,]    3    3    6    6    3    5    5    5    6
## [3,]    6    7    7    7    5    6    6    7    7
## 
## [[4]]
##      [,1] [,2] [,3] [,4]
## [1,]    2    2    2    3
## [2,]    3    3    5    5
## [3,]    6    5    6    6
## [4,]    7    6    7    7

# A list which contrains a matrix, for each dimension, with the attributes tested at that step

train_swag_svml$cv_alpha

## [1] 0.11502886 0.05491943 0.04030784 0.03941438

# The cut-off cv training error, at each dimension, determined by the choice of alpha

The other two learners that we have implemented on swag are: lasso (glmnet package required) and random forest (party package required). The training phase for these learners, differs a little with respect to the SVM one. We can look at the random forest for a practical example:

### Random Forest Learner ###
train_swag_rf <- swag(
  # arguments for swag
  x = x, 
  y = y, 
  control = swagcon,
  auto_control = FALSE,
  # arguments for caret
  trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
  metric = "Accuracy",
  method = "rf",
  # dynamically modify arguments for caret
  caret_args_dyn = function(list_arg,iter){
    list_arg$tuneGrid = expand.grid(.mtry=sqrt(iter))
    list_arg
  }
)

## [1] "Dimension explored: 1 - CV errors at alpha: 0.0996"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0534"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0461"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0425"

The newly introduced argument caret_args_dyn enables the user to modify the hyper-parameters related to a given learner in a dynamic way since they can change as the dimension grows up to the desired (p_{max}). This allows to adapt the mtry hyper-parameter as the dimension grows. In the example above, we have fixed mtry to the square root of the number of attributes at each step as it is usually done in practice.

You can tailor the learning arguments of swag() as you like, introducing for example grids for the hyper-parameters specific of a given learner or update these grids as the dimension increases similarly to what is usually done for the caret package. This gives you a wide range of possibilities and a lot of flexibility in the training phase.

To conclude this brief introduction, we present the usual predict() function which can be applied to a swag trained object similarly to many other packages in R. We pick the random forest learner for this purpose.

# best learner predictions 
# if `newdata` is not specified, then predict gives predictions based on the training 
# sample

sapply(predict(object = train_swag_rf), function(x) head(x))

## $predictions
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    1
## [5,]    1
## [6,]    2
## 
## $models
## $models[[1]]
## [1] 3 5 6 7

# best learner predictions 
best_pred <- predict(object = train_swag_rf, 
                     newdata = x_test)

sapply(best_pred, function(x) head(x))

## $predictions
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    2
## [5,]    1
## [6,]    1
## 
## $models
## $models[[1]]
## [1] 3 5 6 7

# predictions for a given dimension 

dim_pred <-  predict(
  object = train_swag_rf, 
  newdata = x_test, 
  type = "attribute",
  attribute = 4L)


sapply(dim_pred,function(x) head(x))

## $predictions
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1
## [4,]    2    2    2    2
## [5,]    1    1    1    1
## [6,]    1    1    1    1
## 
## $models
## $models[[1]]
## [1] 2 3 5 6
## 
## $models[[2]]
## [1] 2 3 5 7
## 
## $models[[3]]
## [1] 3 5 6 7
## 
## $models[[4]]
## [1] 2 3 6 7

# predictions below a given CV error

cv_pred <-  predict(
  object = train_swag_rf, 
  newdata = x_test, 
  type = "cv_performance",
  cv_performance = 0.04)

sapply(cv_pred,function(x) head(x))

## $predictions
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    2
## [5,]    1
## [6,]    1
## 
## $models
## $models[[1]]
## [1] 3 5 6 7

Now we can evaluate the performance of the best learner selected by swag thanks to the confusionMatrix() function of caret.

# transform predictions into a data.frame of factors with levels of `y_test`
best_learn <- factor(levels(y_test)[best_pred$predictions])
caret::confusionMatrix(best_learn,y_test)

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  benign malignant
##   benign        90         0
##   malignant      0        46
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9732, 1)
##     No Information Rate : 0.6618     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.6618     
##          Detection Rate : 0.6618     
##    Detection Prevalence : 0.6618     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : benign     
##

Thanks for the attention. You can definitely say that you worked with swag !!!

swag package

Installing the package with GitHub

Quick start

`swag` package