modelc is an R model object to SQL compiler. It generates SQL select statements from linear and generalized linear models.
Its interface currently consists of a single function,
modelc
, which takes a single input, namely an
lm
or glm
model object.
It currently supports Gaussian and gamma family distributions using log or identity link functions.
To import linear models directly to your SQL Server database,
consider using Castpack, which depends
on modelc
.
Supposing the following data
<- 1:10
a <- 2*1:10 + runif(1) * 1.5
b <- as.factor(1:10)
c <- data.frame(a,b,c)
df = b ~ a + c formula
A vanilla linear model
<- lm(formula, data=df)
linear_model modelc(linear_model)
generates the following SQL
0.231808555545287 + 2 * `a` + (
CASE
WHEN c = 2 THEN -0.00000000000000193216758587821 * c
WHEN c = 3 THEN -0.000000000000000776180314897008 * c
WHEN c = 4 THEN -0.000000000000000665297412768863 * c
WHEN c = 5 THEN -0.00000000000000055441451064072 * c
WHEN c = 6 THEN -0.000000000000000887620818362638 * c
WHEN c = 7 THEN -0.000000000000000332648706384432 * c
WHEN c = 8 THEN -0.00000000000000110994422395641 * c
WHEN c = 9 THEN -0.00000000000000188723974152839 * c
WHEN c = 10 THEN 0 * c
END
)
GLMs are also supported with log or identity link functions
<- glm(formula, data=df, family=Gamma(link="log"))
glm_model modelc(glm_model)
EXP(
0.557874070609732 + 0.244938197625494 * `a` + (
CASE
WHEN c = 2 THEN 0.394878990324516 * c
WHEN c = 3 THEN 0.536977925025217 * c
WHEN c = 4 THEN 0.570378881020516 * c
WHEN c = 5 THEN 0.542936294999294 * c
WHEN c = 6 THEN 0.476536561025273 * c
WHEN c = 7 THEN 0.383038044594683 * c
WHEN c = 8 THEN 0.269593156578649 * c
WHEN c = 9 THEN 0.140849942185343 * c
WHEN c = 10 THEN 0 * c
END
) )
<- glm(formula, data=df, family=Gamma(link="identity"))
glm_model_idlink modelc(glm_model_idlink)
0.231808555545287 + 2 * `a` + (
CASE
WHEN c = 2 THEN 0.00000000000000139594865689472 * c
WHEN c = 3 THEN -0.000000000000000581567338978993 * c
WHEN c = 4 THEN -0.00000000000000111588502938831 * c
WHEN c = 5 THEN 0.000000000000000967650035758108 * c
WHEN c = 6 THEN -0.00000000000000149265067586469 * c
WHEN c = 7 THEN -0.000000000000000100985345060517 * c
WHEN c = 8 THEN -0.0000000000000000673235633736781 * c
WHEN c = 9 THEN 0.00000000000000199047558220559 * c
WHEN c = 10 THEN 0 * c
END
)
In order to avoid generating invalid SQL, modelc
temporarily sets your scipen
option to 999.
Using devtools
:
install.packages("devtools")
install.packages("remotes")
::install_github("sparkfish/modelc") remotes
Note that you may encounter minor differences between the output of your R and generated SQL models depending on the precision with which your numeric types are represented in the database. To ensure parity between the two models, numeric types should have a precision of at least 17.
Tests are written using testthat
. To run them, simply
do
::test() devtools