This vignette aims to discuss some of the common programming concepts
and conventions that have been adopted within the {admiral}
family of packages. It is intended to be a user-facing version of the Programming
Strategy vignette, but users can also read the latter after becoming
more familiar with the package to expand further on any topics of
interest. For some of common {admiral}
FAQ, visit the
corresponding FAQ page provided in the same drop down menu as this
vignette.
It is expected that the input dataset is not grouped. Otherwise an error is issued.
The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.
{admiral}
Functions and OptionsAs a general principle, the behavior of the {admiral}
functions is only determined by their input, not by any global object,
i.e. all inputs like datasets, variable names, options, etc. must be
provided to the function by arguments. Correspondingly, in general
functions do not have any side-effects like creating or modifying global
objects, printing, writing files, etc.
An exception to the above principle is found in our approach to
package options (see get_admiral_option()
and
set_admiral_options()
), which allow for user-defined
defaults on commonly used function arguments. For instance, the option
subject_keys
is currently pre-defined as
exprs(STUDYID, USUBJID)
, but can be modified using
set_admiral_options(subject_keys = exprs(...))
at the top
of a script.
For a full discussion on admiral Inputs, Outputs and Options, see this section on our developer-facing Programming Strategy.
When using the {haven}
package to read SAS datasets into
R, SAS-style character missing values, i.e. ""
, are
not converted into proper R NA
values. Rather they
are kept as is. This is problematic for any downstream data processing
as R handles ""
just as any other string. Thus, before any
data manipulation is being performed SAS blanks should be converted to R
NA
s using {admiral}
’s
convert_blanks_to_na()
function, e.g.
Note that any logical operator being applied to an NA
value always returns NA
rather than
TRUE
or FALSE
.
visits <- c("Baseline", NA, "Screening", "Week 1 Day 7")
visits != "Baseline"
#> [1] FALSE NA TRUE TRUE
The only exception is is.na()
which returns
TRUE
if the input is NA
.
Thus, to filter all visits which are not "Baseline"
the
following condition would need to be used.
Also note that most aggregation functions, like mean()
or max()
, also return NA
if any element of the
input vector is missing.
To avoid this behavior one has to explicitly set
na.rm = TRUE
.
This is very important to keep in mind when using
{admiral}
’s aggregation functions such as
derive_summary_records()
.
For handling of NA
s in sorting variables see Sort Order.
expr()
,
exprs()
, !!
and !!!
expr()
and exprs()
expr()
is a function from the {rlang}
package, which is used to create an expression. The
expression is not evaluated - rather, it is passed on to the derivation
function which evaluates it in its own environment. exprs()
is the plural version of expr()
, so it accepts multiple
comma-separated items and returns a list of expressions.
library(rlang)
adae <- data.frame(USUBJID = "XXX-1", AEDECOD = "HEADACHE")
# Return the adae object
adae
#> USUBJID AEDECOD
#> 1 XXX-1 HEADACHE
# Return an expression
expr(adae)
#> adae
When used within the contest of an {admiral}
derivation
function, expr()
and exprs()
allow the
function to evaluate the expressions in the context of the input
dataset. As an example, expr()
and exprs()
allow users to pass variable names of datasets to the function without
wrapping them in quotation marks.
The expressions framework is powerful because users are able to
intuitively “inject code” into admiral
functions (through
the function parameters) using very similar syntax as if they were
writing open code, with the exception possibly being an outer
exprs()
wrapper. For instance, in the
derive_vars_merged()
call below, the user is merging
adsl
with ex
and is able to filter
ex
prior to the merge using an expression passed to the
filter_add
parameter. Because filter_add
accepts expressions, the user has full power to filter their dataset as
they please. In the same vein, the user is able to create any new
variables they wish after the merge using the new_vars
argument, to which they pass a list of expressions containing “standard”
R code.
!!
) and Bang-Bang-Bang
(!!!
)Sometimes you may want to construct an expression using other, pre-existing expressions. However, it’s not immediately clear how to achieve this because expressions inherently pause evaluation of your code before it’s executed:
This is where !!
(bang-bang) comes in: provided again by
the {rlang}
package, it allows you to inject the contents
of an expression into another expression, meaning that by using
!!
you can modify the code inside an expression before R
evaluates it. By using !!
you are
unquoting an expression, i.e. evaluating it before you
pass it onwards.
You can see an example of where !!
comes in handy within
{admiral}
code in Common Pitfall 1,
where the contents of an expression is unquoted so that it can be passed
to derive_vars_merged()
.
!!!
(bang-bang-bang) is the plural version of
!!
and can be used to unquote a list of expressions:
Within {admiral}
, this operator can be useful if we need
to unquote a list of variables (stored as expressions) to use them
inside of an {admiral}
or even {dplyr}
call.
One example is the {admiral}
subject keys:
If we want to use the subject keys stored within this
{admiral}
option to subset a dataset, we need to use
!!!
to unquote this list. Let’s construct a dummy example
to illustrate the point:
adcm <- data.frame(STUDYID = "XXX", USUBJID = "XXX-1", CMTRT = "ASPIRIN")
adcm
#> STUDYID USUBJID CMTRT
#> 1 XXX XXX-1 ASPIRIN
# This doesn't work as we are not unquoting the subject keys
adcm %>% select(get_admiral_option("subject_keys"))
#> Error in `select()`:
#> ! Can't select columns with `get_admiral_option("subject_keys")`.
#> ✖ `get_admiral_option("subject_keys")` must be numeric or character, not a list.
# This works because we are unquoting the subject keys
adcm %>% select(!!!get_admiral_option("subject_keys"))
#> STUDYID USUBJID
#> 1 XXX XXX-1
You can see another example of !!!
in action in this
line of the {admiral}
ADEX
template
script, where it is used to dynamically control the by variables passed
to an {admiral}
function.
In summary, although the expressions framework may seem slightly
clunky and mysterious to begin with, it allows for such power and
flexibility that it forms a key part of the {admiral}
package. For a comprehensive treatment of expressions, see Chapter 18 and Chapter 19 of the
Advanced R textbook. Chapter 19 specifically covers in much more detail
the concept of unquoting.
Expressions are very powerful, but this can also lead to misunderstandings about their functionality. Let’s set up some dummy data to explore common issues that new (or experienced!) programmers may encounter when dealing with expressions.
library(dplyr, warn.conflicts = FALSE)
library(admiral)
vs <- tribble(
~USUBJID, ~VSTESTCD, ~VISIT, ~VSSTRESN, ~VSSTRESU, ~VSDTC,
"01-1301", "WEIGHT", "SCREENING", 82.1, "kg", "2013-08-29",
"01-1301", "WEIGHT", "WEEK 2", 81.19, "kg", "2013-09-15",
"01-1301", "WEIGHT", "WEEK 4", 82.56, "kg", "2013-09-24",
"01-1302", "BMI", "SCREENING", 20.1, "kg/m2", "2013-08-29",
"01-1302", "BMI", "WEEK 2", 20.2, "kg/m2", "2013-09-15",
"01-1302", "BMI", "WEEK 4", 19.9, "kg/m2", "2013-09-24"
)
dm <- tribble(
~USUBJID, ~AGE,
"01-1301", 18
)
When writing more complex {admiral}
code it can be easy
to mistakenly pass the wrong input to an argument that expects an
expression. For example, the code below fails because
my_expression
is not an expression - it is the name of an
object in the global environment containing an expression.
my_expression <- expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING")
derive_vars_merged(
dm,
dataset_add = select(vs, USUBJID, VSTESTCD, VISIT),
by_vars = exprs(USUBJID),
filter_add = my_expression
)
#> Error in `derive_vars_merged()`:
#> ! Argument `filter_add` must be a filter condition, but is a symbol
To fix this code, we need to unquote
my_expression
so that the expression that it is holding is
passed correctly to derive_vars_merged()
:
In a similar vein to above, even if an actual expression is passed as an argument, you must make sure that it can be evaluated within the dataset of interest. This may seem trivial, but it is a common pitfall because expressions delay evaluation of code and so can delay the identification of issues. For instance, consider this example:
filter_vs_and_merge <- function(my_expression) {
derive_vars_merged(
dm,
dataset_add = select(vs, USUBJID, VSTESTCD, VISIT),
by_vars = exprs(USUBJID),
filter_add = !!my_expression
)
}
# This works
filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING"))
#> # A tibble: 1 × 4
#> USUBJID AGE VSTESTCD VISIT
#> <chr> <dbl> <chr> <chr>
#> 1 01-1301 18 WEIGHT SCREENING
# This fails
filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING" & VSTPT == "PREDOSE"))
#> Error in `filter()`:
#> ℹ In argument: `VSTESTCD == "WEIGHT" & VISIT == "SCREENING" & VSTPT ==
#> "PREDOSE"`.
#> Caused by error:
#> ! object 'VSTPT' not found
The second call fails because hidden within the expression is a
mention of VSTPT
, which was dropped from vs
in
filter_vs_and_merge()
.