The basic idea of calculating the importance of attributes in a linear regression is according to the coefficients in the regression. However, when we put too many independent variables to regress, we can not promise that all those independent variables are independently distributed, commonly speaking. On other words, it may have great possibility that several attributes are collinearity, which also known as highly correlated. In an example context, we can easily remove the highly correlated attributes and then do the regression. However, in real world business cases, all the attributes we selected are important and meaningful, thus we can not remove the attributes which are highly correlated randomly. Therefore, we need to find out how to calculating the importance of attributes when several attributes are collinearity.
Shapley Value regression is also called Shapley regression, Shapley Value analysis, Kruskal analysis, and dominance analysis, and incremental R-squared analysis. Apart from using it while independent variables are moderately to highly correlated in linear regression, it also can be used when computing the contribution of each predictors in machine learning.
This package only has one function shapleyvalue
, and you can use it to analyze the relative importance of attributes in linear regression.
Here, we use the bulit-in dataset Boston
in package MASS
. In this demo, medv
as dependent variable, nox
, rm
, age
, dis
as four predictors, and we want to find out the importance of each predictor.
library(ShapleyValue)
<- Boston
data head(data) %>%
kbl() %>%
kable_classic(full_width = F, html_font = "Cambria")
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | black | lstat | medv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 | 24.0 |
0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 | 36.2 |
0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 | 28.7 |
<- data$medv
y <- as.data.frame(data[,5:8])
x <- shapleyvalue(y,x)
value %>%
value kbl() %>%
kable_classic(full_width = F, html_font = "Cambria")
nox | rm | age | dis | |
---|---|---|---|---|
Shapley Value | 0.0836 | 0.3938 | 0.0573 | 0.0272 |
Standardized Shapley Value | 0.1488 | 0.7009 | 0.1020 | 0.0483 |