The dbplyr
package abstracts away many aspects of database interaction. However,
code that uses dbplyr must
construct an appropriate database connection, and use it to create any
dplyr::tbl()
instances that are used to access data. This
creates two problems for code that one might want to distribute. First,
the connection-building DBI::dbConnect()
or
src_whatever
statement typically contains credentials used
to authenticate to the database server. This creates a security risk, as
sending around personal credentials isn’t a good idea for a variety of
reasons, and using the same “service credentials” for many users makes
tracking usage more difficult. It also makes it harder to reuse code,
since it has to be edited to accommodate different users or different
databases. Second, the connection info will often include other
parameters, such as schema, data transfer settings, or the like. Users
have adopted a number of solutions to these problems, such as reading
environment variables or sourcing a file in a known location. Each works
well in some circumstances, but runs into problems with scope or
logistics in others.
The {srcr} package provides an alternative in this area that aims to
be easy enough to use for straightforward cases, and flexible enough to
adapt to complex projects and environments. Its workhorse is the
srcr()
function. As its documentation explains,
srcr()
is an adapter that lets you create a data source of
a type known to DBI
or old-style dplyr connection
functions, using configuration data passed to srcr()
, or,
more often, supplied in a configuration file.
Configuration files provide a simple way to represent in JSON the information needed to construct a data source. The JSON must define a single object (or hash), which is translated into a list structure within R. Two keys from the object define the database connection.
src_name
SQLite
or PostgreSQL
(N.B. the initial
R
in the package name is not included), or the name of an
old-style dplyr
function used to construct a data source, typically a database
connection, such as src_postgres
or src_mysql
.
src_args
Here’s a typical example:
{
"src_name" : "Postgres",
"src_args" : {
"host" : "my.database.server",
"port" : 5432,
"dbname" : "project_db",
"username" : "my_credential",
"password" : "DontL00k@This",
"options" : "-c search_path=schema0,schema1,schema2"
},
"post_connect_sql" : [
"set role project_staff;"
],
"post_connect_fun: [
"function (db) {",
"DBI::dbExecute(db, paste0('insert into connect_log (user, date) ',",
" 'values (\'', 'my_credential', '\', \'', Sys.Date(), '\')'))",
"set.seed(271828)",
"message('Session setup complete')",
"}"
]
}
If you’re deriving the configuration information programatically, you
can pass it directly to srcr()
via the config
argument, but it’s perhaps more common that configuration remains the
same across different connections for a given project. For these cases,
{srcr} encourages separation of configuration from code.
Once the connection is established, there may be additional work to do. For example, session settings may need to be altered, or schemas to search specified, or authorization roles changed. There are two additional keys that srcr provides to address this:
post_connect_sql
post_connect_fun
srcr()
.
In both cases, this makes it possible to execute additional code
specified in the configuration file. Since this creates the possibility
that unknown code may be executed and produce unwanted effects, you need
to opt in to each, using the allow_post_connect
parameter
to srcr()
.
The {srcr} package tries to provide you with a lot of flexibility in
the way you deploy your code and configuration, by letting you get the
latter to srcr()
in a variety of ways. The first option
that returns valid JSON is used, and later options aren’t checked;
srcr()
does not try to merge data from more than one
source.
srcr()
what to read directlyIf you know where your configuration data lives, you can point
srcr()
directly to it, using the paths
argument. This is a vector of paths to check, so you can provide a
series of places to look, and srcr()
will use the first one
it finds. Each place can be a path to a local file, or a URL that
returns JSON. (As an implementation detail, since srcr()
uses jsonlite::fromJSON
under the hood, paths
can also contain a JSON string rather
than a pointer to outside configuration data. We don’t make any promises
about this, as jsonlite::fromJSON
might change someday, but it can be a handy way to provide fallback
configuration information after having srcr()
check for an
outside resource.)
If you just want to supply the configuration data directly, or obtain
it from a source that doesn’t speak JSON, you can pass an
appropriately-structured list of lists directly to srcr()
via the config
parameter, which overrides the search for
configuration files.
Through the find_config_files()
function (implicitly
called by srcr()
if there’s no paths
or
config
provided), {srcr} tries to support a number of
common deployment styles through its use of default search locations for
configuration files. For those who prefer per-user config files, it will
look in your home directory, or, for those who like to collect you
config files out of the way, in $HOME/.srcr
. If you prefer
to deploy configuration data with your application, you can put the
configuration file in the same directory as your main application
program. Finally, you can put the configuration file in the same
directory as library code that calls srcr()
either directly
or through one intermediate call. Note that the current working
directory isn’t in this search path by default; this is a measure
intended to minimize the harm someone can cause by invoking your program
in a working directory that might subvert the intended connection,
either intentionally or through accidental use of a common configuration
file name.
Similarly, srcr()
will try by default to find files with
the same basename as your application, or as the library file(s) making
the call to srcr()
. Optionally, the file can have a “type”
(i.e. suffix) of .json
or .conf
, or none at
all.
If these options don’t suit your deployment strategy, you can provide
explicit hints to srcr()
using the dirs
,
basenames
, and suffices
arguments.
Finally, to accommodate convention on Unix-like systems,
find_config_files()
first checks for a “hidden” file with a
leading .
before checking for the plain basename.
It’s worth noting that srcr()
expects any configuration
files it finds to contain JSON, regardless of the file suffix. This was
done to minimize the number of other packages that are prerequisites for
{srcr}. If you prefer another format, such as YAML or Apache, you can
get there pretty easily (example simplified a bit):
my_srcr <- function(...) {
files <- find_config_files(...)
mydata <- read_my_config_format(files)
srcr(config = mydata)
}
If you need to specify where to look at runtime, you can use the
environment variable BASENAME_CONFIG
to point to a
configuration file, where BASENAME is one of the basenames
find_config_files()
would usually check (see below). One
note: srcr()
will only pay attention to this environment
variable if it points to an actual file, not a URL or JSON string. This
is construed as a feature, in that it may limit the damage someone can
inflict by fiddling with the environment. If you trust the environment,
you can be more permissive by writing something like
my.paths <- c()
for (bn in my.basenames) {
my.info <- Sys.getenv(paste0(toupper(bn), '_CONFIG'))
if (my.info != '') my.paths <- c(my.paths, my.info)
}
src <- if (length(paths) > 0) srcr(paths = my.paths) else srcr(other.args)
The srcr()
function is intended to help manage database
connections, and contains a few optimizations for this (such as turning
src_name
into a driver argument if it doesn’t start with
‘src_’). But at its root, it’s a function that takes a name and
arguments from configuration data, executes them (and maybe some
post-processing), and returns the result. You may find this pattern
useful in other contexts. If so, you can implement your own function
named “src_whatever”, and use srcr()
to allow it
to access configuration.
Enjoy!