gdim
estimates graph dimension using cross-validated
eigenvalues, via the graph-splitting technique developed in https://arxiv.org/abs/2108.03336. Theoretically, the
method works by computing a special type of cross-validated eigenvalue
which follows a simple central limit theorem. This allows users to
perform hypothesis tests on the rank of the graph.
You can install gdim
from CRAN with:
install.packages("gdim")
# to get the development version from GitHub:
install.packages("pak")
::pak("RoheLab/gdim") pak
eigcv()
is the main function in gdim
. The
single required parameter for the function is the maximum possible
dimension, k_max
.
In the following example, we generate a random graph from the stochastic block model (SBM) with 1000 nodes and 5 blocks (as such, we would expect the estimated graph dimension to be 5).
library(fastRG)
#> Loading required package: Matrix
<- matrix(0.1, 5, 5)
B diag(B) <- 0.3
<- sbm(
model n = 1000,
k = 5,
B = B,
expected_degree = 40,
poisson_edges = FALSE,
allow_self_loops = FALSE
)
<- sample_sparse(model) A
Here, A
is the adjacency matrix.
Now, we call the eigcv()
function with
k_max=10
to estimate graph dimension.
library(gdim)
<- eigcv(A, k_max = 10)
eigcv_result #> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
eigcv_result#> Estimated graph dimension: 5
#>
#> Number of bootstraps: 10
#> Edge splitting probabaility: 0.1
#> Significance level: 0.05
#>
#> ------------ Summary of Tests ------------
#> k z pvals padj
#> 1 59.7488180 2.220446e-16 2.220446e-15
#> 2 12.9094629 2.220446e-16 2.220446e-15
#> 3 11.8600427 2.220446e-16 2.220446e-15
#> 4 11.9412340 2.220446e-16 2.220446e-15
#> 5 9.0252520 2.220446e-16 2.220446e-15
#> 6 -0.8512008 8.026711e-01 1.000000e+00
#> 7 -0.8182195 7.933841e-01 1.000000e+00
#> 8 -0.9912649 8.392219e-01 1.000000e+00
#> 9 -0.9005808 8.160944e-01 1.000000e+00
#> 10 -1.1677953 8.785553e-01 1.000000e+00
In this example, eigcv()
suggests k=5
.
To visualize the result, use plot()
which returns a
ggplot
object. The function displays the test statistic (z
score) for each hypothesized graph dimension.
plot(eigcv_result)
Chen, Fan, Sebastien Roch, Karl Rohe, and Shuqi Yu. “Estimating Graph Dimension with Cross-Validated Eigenvalues.” ArXiv:2108.03336 [Cs, Math, Stat], August 6, 2021. https://arxiv.org/abs/2108.03336.