library(Canek)
# Functions
## Function to plot the pca coordinates
plotPCA <- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
col <- as.integer(label)
plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
col = as.integer(label), cex = 0.75, pch = 19,
xlab = "PC1", ylab = "PC2")
legend(legPosition, pch = 19,
legend = levels(label),
col = unique(as.integer(label)))
}
On this toy example we use the two simulated batches included in the
SimBatches
data from Canek’s package.
SimBatches
is a list containing:
batches
: Simulated scRNA-seq datasets with genes (rows)
and cells (columns). Simulations were performed using Splatter.cell_type
: a factor containing the celltype labels of
the batcheslsData <- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
batch <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
rep("Batch-2", ncol(lsData[[2]]))))
celltype <- SimBatches$cell_types
table(batch)
#> batch
#> Batch-1 Batch-2
#> 631 948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4
#> 1451 53 38 37
We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.
We correct the toy batches using the function RunCanek. This function accepts:
On this example we use the list of matrices created before.
We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS/LAPACK: /Users/martin/miniconda3/envs/R_4.1.3/lib/libopenblasp-r0.3.18.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] Canek_0.2.5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.10 highr_0.10 DEoptimR_1.0-11
#> [4] bslib_0.4.2 compiler_4.1.3 bluster_1.4.0
#> [7] jquerylib_0.1.4 class_7.3-21 prabclus_2.3-2
#> [10] BiocNeighbors_1.12.0 numbers_0.8-5 tools_4.1.3
#> [13] digest_0.6.31 mclust_6.0.0 jsonlite_1.8.4
#> [16] evaluate_0.20 lattice_0.20-45 pkgconfig_2.0.3
#> [19] rlang_1.0.6 Matrix_1.5-3 igraph_1.3.5
#> [22] cli_3.6.0 rstudioapi_0.14 yaml_2.3.7
#> [25] parallel_4.1.3 xfun_0.37 fastmap_1.1.0
#> [28] knitr_1.42 cluster_2.1.4 sass_0.4.5
#> [31] S4Vectors_0.32.4 fpc_2.2-10 diptest_0.76-0
#> [34] nnet_7.3-18 stats4_4.1.3 grid_4.1.3
#> [37] robustbase_0.95-0 R6_2.5.1 flexmix_2.3-18
#> [40] BiocParallel_1.28.3 rmarkdown_2.20 irlba_2.3.5.1
#> [43] kernlab_0.9-32 magrittr_2.0.3 matrixStats_0.63.0
#> [46] modeltools_0.2-23 htmltools_0.5.4 BiocGenerics_0.40.0
#> [49] MASS_7.3-58.3 cachem_1.0.6 FNN_1.1.3.1