In most situations, the model parameters of growth models cannot be known beforehand. Instead, they must be estimated from experimental data. For this reason, model fitting is a central part of modeling the growth of populations. The biogrowth package can do four types of model fitting:
This vignette describes the two functions that
biogrowth implements for this:
fit_growth()
and fit_secondary_growth()
.
fit_growth()
Since version 1.0.0, biogrowth includes the
fit_growth()
function that provides a top-level interface
for fitting primary (and secondary) growth models from experimental
data. This function includes several arguments to describe the fitting
approach, the model equations and the experimental data, as well as
other factors of the model fitting.
The type of model to fit is defined in the environment
argument. If environment="constant"
(default), the function
only fits a primary growth model to the data. On the other hand, if
environment="dynamic"
, the model fits both primary and
secondary models according to the gamma approach. In this case, the
function can fit the growth model either to a single growth experiment
or to several ones (with different environmental conditions). This
behavior is controlled by argument approach
.
Once the fitting approach has been defined, the model equations are
defined through the model_keys
argument, and the
experimental data through fit_data
. Models fitted under
dynamic environmental conditions are defined through
env_conditions
.
Model fitting is done through the FME package. This
package includes two functions for model fitting: modFit()
that uses (non-linear) regression, and modMCMC()
that uses
an adaptive Monte Carlo algorithm. The function
fit_growth()
allows the selection of a fitting approach
using the algorithm
argument. If the MC algorithm is
selected, the number of iterations is defined through
niter
. Both algorithms need initial guesses for every model
parameter. These can be defined through the start
argument.
The biogrowth package includes several functions to aid
in the definition of starting guesses of the model parameters, which
will be described below. Furthermore, fit_growth()
allows
fixing any model parameter with the known
argument. Lastly,
additional arguments can be passed to either modFit()
or
modMCMC()
with the ...
argument.
Apart from that, the function includes several checks of the model
definition. These can be turned off setting
check=FALSE
.
Both predict_growth()
and fit_growth()
can
make the calculations in different log-bases for both the population
size and parameter \(\mu\) using the
arguments logbase_mu
and logbase_logN
. The
reader is referred to the specific vignette dedicated to this topic for
details on the units definition and the interpretation.
Finally, the formula
argument maps the elapsed time and
the population size to the column names in the data. By default,
formula = logN ~ time
, meaning that the elapsed time is
named time
and the population size is named
logN
in the data.
The following sections provide additional details on the use of
fit_growth()
for different types of model fitting.
The most simple approach implemented in fit_growth()
is
fitting primary growth models to experimental data that includes the
variation of a population during an experiment. This behaviour is
defined when environment="constant"
. In this case,
fit_growth()
requires the definition of (at least) 4
arguments:
fit_data
to pass the experimental datamodel_keys
defining the model equationstart
setting the initial values for the model
parametersknown
defining the fixed model parametersThe experimental data must be defined as a tibble (or data frame). It
must include two columns: one defining the elapsed time (by default,
named time
) and a second one defining the logarithm of the
population size (by default, named logN
). Note that the
default names can be changed using the formula
argument. In
this example, we will define a tibble by hand.
my_data <- data.frame(time = c(0, 10, 15, 20, 25, 30, 35, 40, 45, 50),
logN = c(1.2, 1, 1.7, 1.9, 2.3, 3.1, 3.3, 3.8, 4.3, 4.1)
)
ggplot(my_data) + geom_point(aes(x = time, y = logN))
Note that, by default, the population size is interpreted in log10
scale. The base of the logarithm can be modified using the
logbase_logN
argument. Please check the vignette dedicated
to the topic for further details.
The next step in model fitting is the definition of the growth model
to fit to the data using the model_keys
argument. This
argument is a named list assigning model equations to the different
models. In the case when environment="constant"
, the
functions only fits a primary model. Therefore, model_keys
must have a unique entry named “primary” with the model key of a primary
model. A list of valid keys can be retrieved calling
primary_model_data()
without any argument.
In this example, we will use the Baranyi growth model.
In the case of primary model fitting, fit_growth()
uses
non-linear regression through modFit()
. This algorithm
requires the definition of initial guesses for every model parameter.
This is done in fit_growth()
through the argument
start
, which is a named numeric vector (or list). The names
of the vector must be parameter keys defined within
biogrowth. These can be retrieved (as well as
additional model meta-data) by passing a model key to
primary_model_data()
.
Therefore, the Baranyi model requires the definition of four model parameters. For a description of the model equations and the interpretation of the model parameters, please check the specific package vignette.
The biogrowth package includes several functions to
aid in the definition of initial guesses for the model parameters. The
function check_growth_guess()
takes three arguments:
fit_data
defining the experimental data as in
fit_growth()
model_name
, a character vector of length one defining
the primary model (as per primary_model_data()
)guess
, a numeric vector defining the initial guess as
in fit_growth()
environment
, describing the type of environmental
conditions ("constant"
(default) or
"dynamic"
)env_conditions
defining the variation of the
environmental conditions (only if
environment="dynamic"
)Then, the function creates a plot comparing the experimental data against the initial guess.
start <- c(logNmax = 6, lambda = 25, logN0 = 2, mu = .2) # Initial guess
check_growth_guess(
my_data,
models,
start
)
In this case, the plot shows that the initial guess is moderately far
from the experimental data. For instance, the values of
logN0
and logNmax
are too high. These could
potentially cause some convergence issues. Note that this function can
use different log-scales using logbase_mu
or
logbase_logN
.
In order to facilitate the definition of initial guesses,
biogrowth includes make_guess_primary()
.
This function applies some heuristic rules to define guesses for the
initial model parameters. It takes two arguments:
fit_data
that passes the experimental data as described
aboveprimary_model
that defines the growth modelFurthermore, the function includes a logbase_mu
argument
to define the logbase of parameter \(\mu\).
Calling this function with our data and the Baranyi model returns a named vector with an initial guess of the model parameters.
auto_guess <- make_guess_primary(my_data, "Baranyi")
print(auto_guess)
#> logN0 mu lambda logNmax
#> 1.20000000 0.08857143 10.00000000 4.30000000
We can now use check_growth_guess()
to assess the
quality of the initial guess
From the plot, we can see that the initial guess is already quite close to the experimental data.
The last argument to be defined is known
, which allows
fixing any model parameter to a given value. This argument is also a
named list with the same conventions as start
. In a first
example, we will define it as an empty vector, indicating that no model
parameter is fixed (i.e. every one is estimated from the data).
Then, we can call fit_growth()
with these arguments.
The function returns an instance of GrowthFit
. It
includes several S3 methods to facilitate the interpretation of the
results of the model fit. For instance, the print()
method
provides a quick view of the model fitted:
print(primary_fit)
#> Primary growth model fitted to data
#>
#> Growth model: Baranyi
#>
#> Estimated parameters:
#> logN0 mu lambda logNmax
#> 1.0987559 0.1104077 13.2555219 4.2695511
#>
#> Fixed parameters:
#> NULL
#>
#> Parameter mu defined in log-10 scale
#> Population size defined in log-10 scale
The statistical summary of the fit can be retrieved using the
summary()
method.
summary(primary_fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> logN0 1.09876 0.16566 6.633 0.000566 ***
#> mu 0.11041 0.01511 7.306 0.000336 ***
#> lambda 13.25552 3.09831 4.278 0.005215 **
#> logNmax 4.26955 0.20220 21.116 7.35e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1925 on 6 degrees of freedom
#>
#> Parameter correlation:
#> logN0 mu lambda logNmax
#> logN0 1.0000 0.3849 0.7906 -0.1485
#> mu 0.3849 1.0000 0.8020 -0.5394
#> lambda 0.7906 0.8020 1.0000 -0.3253
#> logNmax -0.1485 -0.5394 -0.3253 1.0000
And the plot()
method compares the model fitted against
the experimental data. This function takes additional arguments that
enable editing several aesthetics of the plot. For a complete list,
please check the help page of GrowthFit
. Note that this
method returns an instance of ggplot
, so it can be edited
using additional layers.
The instance of GrowthFit
includes additional methods to
inspect and use the fitted model (predict, vcov, AIC…). For a complete
list, please check its help page. Moreover, this instance is a subclass
of list
, making it easy to access several aspects of the
data. Namely, it has the following entries:
data
with the experimental datastart
initial values of the model parametersknown
fixed model parametersbest_prediction
an instance of
GrowthPrediction
with the fitted curvelogbase_mu
the logbase for \(\mu\)logbase_logN
the logbase for the population sizeenvironment
the type of environment
("constant"
for this type of fit)algorithm
the fitting algorithm
("regression"
for this type of fit)primary_model
the primary model fittedfit_results
an instance of modFit
with the
fitted modelAs was mentioned above, the known
argument allows fixing
any model parameter to a given value. It must be a named vector with the
same conventions as for the initial values. For instance, we can fix the
maximum population size to 4.
If this argument was passed to fit_growth()
together
with auto_guess
, the function would return a warning (as
long as check=TRUE
). The reason for this is that
logNmax
is defined both as an initial guess (i.e. a
parameter to fit) and a fixed parameter. Therefore, the initial guess
must be modified, including only the parameters to fit
(i.e. logN0
, mu
and lambda
)
new_guess <- auto_guess[c("logN0", "mu", "lambda")]
new_fit <- fit_growth(my_data,
models,
new_guess,
known,
environment = "constant",
)
The print
method of the new_fit
reflects
the fact that logNmax
was now fixed.
new_fit
#> Primary growth model fitted to data
#>
#> Growth model: Baranyi
#>
#> Estimated parameters:
#> logN0 mu lambda
#> 1.1178327 0.1194789 14.0961385
#>
#> Fixed parameters:
#> logNmax
#> 4
#>
#> Parameter mu defined in log-10 scale
#> Population size defined in log-10 scale
And the summary
method only includes the statistical
information for the fitted parameters.
summary(new_fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> logN0 1.11783 0.17671 6.326 0.000394 ***
#> mu 0.11948 0.01813 6.590 0.000307 ***
#> lambda 14.09614 3.14958 4.476 0.002882 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.2129 on 7 degrees of freedom
#>
#> Parameter correlation:
#> logN0 mu lambda
#> logN0 1.0000 0.3828 0.7810
#> mu 0.3828 1.0000 0.8007
#> lambda 0.7810 0.8007 1.0000
Another relevant function included in biogrowth is
time_to_size
. This function takes two arguments:
model
an instance of GrowthFit
(or other
object returned by fit_growth()
)size
a target population size (in log units)Then, it calculates the elapsed time required to reach the given size
If the target population size is not reached within the fitted curve,
the function returns NA
.
This function can take two additional arguments. The argument
logbase_logN
allows defining size
in different
log-bases. By default, this function assumes the same base as the one
used in the instance of GrowthFit
. The second argument is
type. By default, type="discrete"
, indicating that the
function calculates a single value of the elapsed time required to reach
the target population size. Setting type = "distribution"
accounts for parameter uncertainty and calculates a distribution for the
elapsed time. This is described in detail in the vignette about
including uncertainty in growth predictions.
The fit_growth()
function can also be used to fit a
growth model based on both primary and secondary models to a set of data
gathered under dynamic experimental conditions. This behavior is
triggered by setting environment="dynamic"
. Then,
fit_growth()
supports two fitting approaches. The first one
assumes that every data point was gathered (at different time points of)
one or more growth experiments under a single, dynamic environmental
profile. This method is set up setting approach="single"
.
The second one, triggered when approach="global"
considers
that different experiments could have different (dynamic) environmental
conditions. This section describes the former approach, whereas the
latter will be described in the following section.
In this case, when environment="dynamic"
and
approach="single"
, the function requires the definition of
the following arguments:
fit_data
to pass the experimental datamodel_keys
defining the model equationstart
setting the initial values for the model
parametersknown
defining the fixed model parametersalgorithm
selecting either regression or an Adaptive
Monte Carlo algorithmenv_conditions
describing the variation of the
environmental conditions during the experimentniter
: number of iterations of the MC algorithm (only
if algorithm="MCMC"
)The arguments env_conditions
and fit_data
are tibbles (or data frames) that describe, respectively, the
experimental design and the observations. The biogrowth
package includes two example datasets to illustrate the input
requirements for this function.
The tibble passed to the argument env_conditions
must
have a column defining the elapsed time (time
by default)
and as many additional columns as environmental factors. The
fit_growth()
function is totally flexible with respect to
the number of factors or the way they are named. The only requirement is
that the definition of every argument is consistent. In the case of
example_env_conditions
, this dataset considers two factors:
temperature and water activity (aw
).
head(example_env_conditions)
#> # A tibble: 3 × 3
#> time temperature aw
#> <dbl> <dbl> <dbl>
#> 1 0 20 0.99
#> 2 5 30 0.95
#> 3 15 35 0.9
Note that for the calculations this function joins the data points by linear interpolation, as shown in this plot:
The tibble passed to the argument fit_data
must have one
column defining the elapsed time (time
by default) and one
defining the logarithm of the population size (logN
by
default). Different column names can be defined using the
formula
argument. For instance,
formula=log_size ~ Time
would mean that the elapsed time is
called “Time” and the logarithm of the population size is called
“log_size”. Note that the name of the column describing the elapsed time
in fit_data
must be identical to the one in
env_conditions
.
head(example_dynamic_growth)
#> # A tibble: 6 × 2
#> time logN
#> <dbl> <dbl>
#> 1 0 0.0670
#> 2 0.517 -0.00463
#> 3 1.03 -0.0980
#> 4 1.55 -0.0986
#> 5 2.07 0.111
#> 6 2.59 -0.0465
By default, the calculations are done considering that
logN
is defined in log10 scale. Nonetheless, this can be
modified with the argument logbase_logN
. Please check the
vignette on the topic for additional details.
The type of secondary model for each environmental factor is defined
by the argument model_names
. This argument is a named
vector where the names refer to the environmental factor and the value
to the type of model. Supported models can be retrieved using
secondary_model_data()
.
In this example we will use cardinal models for water activity and a
Zwietering-type model for temperature. Note that the names of this
vector must be identical to the columns in
env_conditions
.
This modelling approach implements two algorithms for model fitting:
non-linear regression (FME::modFit()
) and an adaptive Monte
Carlo algorithm (FME:modMCMC()
). This can be selected
through the algorithm
argument. If
algorithm="regression"
(default), the fitting is done with
FME:modFit()
. If algorithm="MCMC"
, the fitting
is done with FME::modMCMC()
. In this vignette, we focus on
the former. For a description on MCMC fitting, please check the vignette
on modelling growth including uncertainty.
Regardless of the fitting algorithm, fit_growth()
enables to fit or fix any model parameter. This distinction is made
using the arguments known
(fixed parameters) and
start
(fitted parameters). Note that every parameter of the
primary and secondary model must be in either of these arguments without
duplication.
This function uses the Baranyi primary model. It has two variables
that need initial values (N0
and Q0
) and one
primary model parameter (Nmax
). The specific growth rate is
described using the gamma concept. This requires the definition of its
value under optimal conditions (mu_opt
) as well as the
cardinal parameters for each environmental factor. They must be defined
as factor
+_
+parameter name
. For
instance, the minimum water activity for growth must be written as
aw_xmin
.
In this example we will consider the model parameters of the primary
model as known. For the secondary model for water activity, we will only
consider the optimum value for growth as unknown. Finally, for the
effect of temperature, we will consider the order and xmax
are known:
known_pars <- c(Nmax = 1e4, # Primary model
N0 = 1e0, Q0 = 1e-3, # Initial values of the primary model
mu_opt = 4, # mu_opt of the gamma model
temperature_n = 1, # Secondary model for temperature
aw_xmax = 1, aw_xmin = .9, aw_n = 1 # Secondary model for water activity
)
Then, the remaining model parameters must be defined in
start
. Due to the use of non-linear regression for model
fitting, it is required to define initial values for these parameters.
In order to ease the definition of initial guesses for these parameters,
biogrowth includes the function
check_growth_guess()
. As already mentioned, it takes five
arguments:
fit_data
defining the experimental data as in
fit_growth()
model_name
, a character vector of length one defining
the primary model (as per primary_model_data()
)guess
, a numeric vector defining the initial guess as
in fit_growth()
environment
, describing the type of environmental
conditions ("constant"
(default) or
"dynamic"
)env_conditions
defining the variation of the
environmental conditions (only if
environment="dynamic"
)All these arguments follow the same conventions as in
fit_growth()
. Furthermore, the function includes additional
arguments to define the log-base of \(\mu\) (logbase_mu
) and to
define the column names for the population size and the elapsed time
(formula
).
Then, we can define an initial guess of the model parameters
And pass it to check_growth_guess()
, together with the
values of the fixed parameters
check_growth_guess(
example_dynamic_growth,
sec_model_names,
c(my_start, known_pars),
environment = "dynamic",
env_conditions = example_env_conditions
)
This plot shows that the initial guess of the model parameters is
reasonably close to the experimental data. Then, once every argument has
been defined, we can call fit_growth()
my_dyna_fit <- fit_growth(example_dynamic_growth,
sec_model_names,
my_start, known_pars,
environment = "dynamic",
env_conditions = example_env_conditions
)
Again, fit_growth()
returns an instance of
GrowthFit
with the same S3 methods as for models fitted
under constant environmental conditions.
print(my_dyna_fit)
#> Growth model fitted to data gathered under dynamic environmental conditions using regression
#>
#> Environmental factors included: temperature, aw
#>
#> Secondary model for temperature: Zwietering
#> Secondary model for aw: CPM
#>
#> Parameter estimates:
#> temperature_xmin temperature_xopt aw_xopt
#> 25.7926504 32.4099087 0.9886491
#>
#> Fixed parameters:
#> Nmax N0 Q0 mu_opt temperature_n
#> 1e+04 1e+00 1e-03 4e+00 1e+00
#> aw_xmax aw_xmin aw_n
#> 1e+00 9e-01 1e+00
#> Parameter mu defined in log-10 scale
#> Population size defined in log-10 scale
summary(my_dyna_fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> temperature_xmin 25.79265 0.62454 41.30 < 2e-16 ***
#> temperature_xopt 32.40991 5.46540 5.93 2.54e-06 ***
#> aw_xopt 0.98865 0.03988 24.79 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1478 on 27 degrees of freedom
#>
#> Parameter correlation:
#> temperature_xmin temperature_xopt aw_xopt
#> temperature_xmin 1.0000 -0.5433 0.4490
#> temperature_xopt -0.5433 1.0000 -0.9939
#> aw_xopt 0.4490 -0.9939 1.0000
Nonetheless, the plot()
method can include the variation
of one environmental factor alongside the growth curve. This is done by
passing the name of an environmental factor to the
add_factor
argument. Note that this name must be identical
to the ones used for model definition.
Again, the time_to_size()
function can be used to
calculate the time required to reach a given population size (in log
units):
As mentioned above, fit_growth()
can also fit a unique
growth model to a set of experimental data including the results of
growth experiments under different (dynamic) environmental conditions.
This is triggered passing approach="global"
. In this case,
the function takes the same arguments as for
approach="single"
.
fit_data
to pass the experimental datamodel_keys
defining the model equationstart
setting the initial values for the model
parametersknown
defining the fixed model parametersalgorithm
selecting either regression or an Adaptive
Monte Carlo algorithmenv_conditions
describing the variation of the
environmental conditions during the experimentniter
: number of iterations of the MC algorithm (only
if algorithm="MCMC"
)However, the conventions for the definition of fit_data
and env_conditions
change when
approach="global"
. Instead of being defined as a tibble (or
data.frame), this information must be defined as a (named) list with as
many elements as experiments. Then, each entry describes the particular
information for each experiment. The biogrowth package
includes the multiple_counts
dataset as an example of
this.
It is a list of two elements named exp1
and
exp2
. Then, each one of them is a tibble (it could also be
a data.frame) describing the experimental observations for each
experiment following the same conventions as when
approach="single"
data("multiple_counts")
names(multiple_counts)
#> [1] "exp1" "exp2"
head(multiple_counts[[1]])
#> time logN
#> 1 0.000000 -0.20801574
#> 2 1.666667 -0.03630986
#> 3 3.333333 -0.29846914
#> 4 5.000000 0.35029686
#> 5 6.666667 0.14326140
#> 6 8.333333 -0.40357904
In a similar way, the variation of the experimental conditions during
the experiment are also described as a (named) list. This list should
have the same length (and names) as the experimental data, so each
growth experiment is mapped to an environmental profile. The conditions
during each experiment are described following the same conventions as
when algorithm="single"
. That is, using a tibble (or
data.frame) with one column describing the elapsed time and as many
additional columns as environmental factors considered in the model.
Note that, although the function admits any number of environmental
conditions, these must be presnet in each experiment.
The multiple_conditions
dataset included in the package
is a convenient example of this format. It includes the results of two
experiments, paired with multiple_counts
, that consider two
environmental factors: temperature
and pH
.
data("multiple_conditions")
names(multiple_conditions)
#> [1] "exp1" "exp2"
head(multiple_conditions[[1]])
#> time temperature pH
#> 1 0 20 6.5
#> 2 15 30 7.0
#> 3 40 40 6.5
The model equations, the initial guesses of the model parameters and
the fixed parameters are defined in exactly the same way as when
approach="single"
sec_models <- list(temperature = "CPM", pH = "CPM")
## Any model parameter (of the primary or secondary models) can be fixed
known_pars <- list(Nmax = 1e8, N0 = 1e0, Q0 = 1e-3,
temperature_n = 2, temperature_xmin = 20,
temperature_xmax = 35,
pH_n = 2, pH_xmin = 5.5, pH_xmax = 7.5, pH_xopt = 6.5,
temperature_xopt = 30)
## The rest, need initial guesses
my_start <- list(mu_opt = .8)
Again, check_growth_guess()
can be used to assess the
quality of the initial guess of the model parameters.
check_growth_guess(
multiple_counts,
sec_models,
c(my_start, known_pars),
environment = "dynamic",
env_conditions = multiple_conditions,
approach = "global"
)
Considering that the initial guess is relatively close to the
observations, once all the arguments have been defined, we can call
fit_growth()
global_fit <- fit_growth(multiple_counts,
sec_models,
my_start,
known_pars,
environment = "dynamic",
algorithm = "regression",
approach = "global",
env_conditions = multiple_conditions
)
In this case, the function returns an instance of
GlobalGrowthFit
. The print method provides a summary of the
model fitted.
print(global_fit)
#> Growth model fitted to data following a global approach conditions using regression
#>
#> Number of experiments: 2
#>
#> Environmental factors included: temperature, pH
#>
#> Secondary model for temperature: CPM
#> Secondary model for pH: CPM
#>
#> Parameter estimates:
#> mu_opt
#> 0.5097542
#>
#> Fixed parameters:
#> Nmax N0 Q0 temperature_n
#> 1.0e+08 1.0e+00 1.0e-03 2.0e+00
#> temperature_xmin temperature_xmax pH_n pH_xmin
#> 2.0e+01 3.5e+01 2.0e+00 5.5e+00
#> pH_xmax pH_xopt temperature_xopt
#> 7.5e+00 6.5e+00 3.0e+01
#> Parameter mu defined in log-10 scale
#> Population size defined in log-10 scale
A more detailed output of the fitted model can be retrieved by
summary()
summary(global_fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> mu_opt 0.509754 0.005812 87.7 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.463 on 49 degrees of freedom
#>
#> Parameter correlation:
#> mu_opt
#> mu_opt 1
In this case, plot()
shows several subplots, one for
each experiment
As well as before, the plot can include the variation of any
environmental factor alongside the growth curve, by passing the name of
an environmental factor to add_factor
. This function
includes additional arguments for editing other aesthetics of the plot.
For a complete list of options, please check the help page of
GlobalGrowthFit
.
plot(global_fit, add_factor = "temperature",
label_y2 = "Temperature (ºC)",
line_col2 = "green",
line_type2 = "dotted")
Again, the time_to_size()
function can be used to
estimate the elapsed time required to reach a given population size (in
log units). In this case, the function returns a named vector, with the
elapsed time required to reach the target population size for each
experiment
If the target population size was not reached for any of the
experiments, the function returns NA
fit_secondary_growth()
The other modeling approach included in biogrowth is
fitting secondary growth models to a dataset comprising several growth
rates estimated from various growth experiments (i.e. several values of
\(\mu\)). Because the hypotheses under
this approach are very different from those used for fitting primary
models and primary & secondary models, this approach is implemented
in a separate function: fit_secondary_growth()
. This
function has 8 arguments:
fit_data
: data used for the fit.starting_point
: initial value of the model
parametersknown_pars
: model parameters fixed (not fitted to the
data).sec_model_names
: type of secondary model for each
environmental factor.transformation
: transformation of the growth rate for
the fit (square root by default)....
: additional arguments passed to
modFit()
.check
: whether to do some validity checks of the model
parameters (TRUE
by default).formula
: a formula defining the column name defining
the growth rate.The fit_data
argument must be a tibble containing the
growth rates observed in several experiments under static environmental
conditions. It must have one column describing the observed growth rate.
Then, it must have as many additional columns as environmental factors
included in the experiment. By default, the column describing the growth
rate must be named mu
. This can be changed using the
formula
argument, which is a one-sided formula, where the
left hand side defines the column name.
The biogrowth package includes the dataset
example_cardinal
to illustrate the data used by this
function. It represents the specific growth rate (in log10 CFU/h)
observed in several growth experiments under static environmental
conditions, where each row represent one experiment. In this simulated
dataset, two environmental factors were considered: temperature and
pH.
data("example_cardinal")
head(example_cardinal)
#> temperature pH mu
#> 1 0.000000 5 9.768505e-04
#> 2 5.714286 5 2.624919e-03
#> 3 11.428571 5 0.000000e+00
#> 4 17.142857 5 1.530706e-04
#> 5 22.857143 5 2.301817e-05
#> 6 28.571429 5 3.895598e-04
In the example dataset, the series of experiments considered two
environmental conditions: temperature and pH. Nonetheless, the
fit_secondary_growth()
function is entirely flexible with
respect to the number of factors and their names. The only restriction
is that the definition of the columns of the dataset and the secondary
models is consistent.
The type of secondary model to use for each environmental factor is
defined in the sec_model_names
argument. It is a named
vector whose names are the environmental factors and whose values define
the model to use. The model keys implemented in
biogrowth can be retrieved using
secondary_model_data()
.
For this example, we will use a CPM for pH and an Zwietering model
for temperature (this decision is not based on any scientific argument,
just as demonstration of the functions in the package). Note that the
names of the vector are identical to the column names of
fit_data
.
The fit_secondary_growth()
function estimates the values
of the cardinal parameters, as well as the growth rate under optimal
conditions using the modFit
function from
FME, which uses non-linear regression. This algorithm
requires the definition of initial guesses for every parameter estimate.
These can be defined based on experience or exploratory analysis. In
order to facilitate this definition, biogrowth includes
the function make_guess_secondary()
. This function makes a
guess of the model parameters based on basic heuristic rules. It takes
two arguments:
fit_data
the data for the fitsec_model_names
a named vector assigning models for
each environmental factorBoth arguments follow the same conventions as those defined for
fit_secondary_growth()
.
Calling the function returns a numeric vector with initial guesses for every parameter estimate.
sec_guess <- make_guess_secondary(example_cardinal, sec_model_names)
sec_guess
#> mu_opt temperature_xmin temperature_xopt temperature_n
#> 1.234784 0.000000 34.285714 2.000000
#> pH_xmin pH_xopt pH_xmax pH_n
#> 5.000000 6.428571 7.000000 2.000000
The output of this function shows the naming rules for
fit_secondary_growth()
The growth rate under optimal
conditions is named mu_opt
. The remaining cardinal
parameters are named according to the convention
environ_factor
+_
+parameter(lower case)
.
For instance, the minimum temperature for growth is
temperature_xmin
and the order of the CPM for pH is
pH_n
. Note that the environmental factor must be identical
to the one used in sec_model_names
.
The last argument to define before calling
fit_secondary_growth()
is known
, which allows
fixing any model parameter. As a first try, we will assign to it an
empty vector, indicating that every model parameter is fitted from the
data.
Finally, the transformation
argument defines the
transformation of the growth rate to use for model fitting. By default,
the function applies a square root transformation, which has proved to
stabilize the variance of microbial growth. Once the arguments have been
defined, we can call the fit_secondary_growth()
function.
Note that, because we are using the default value of
transformation
, we do not need to define this argument. The
same applies to formula, as the growth rate is named mu
in
example_cardinal
.
Then, we can call the fit_secondary_growth()
function.
fit_cardinal <- fit_secondary_growth(example_cardinal,
sec_guess,
known,
sec_model_names)
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
#> Warning in sqrt(my_data$pred_mu): Se han producido NaNs
Before doing the calculations, this function does several validity
checks of the model parameters, raising warnings or errors if there is
some discrepancy between the parameter definition and the model
requirements. If the fitting was successful, it returns an instance of
FitSecondaryGrowth
.
This class incorporates several S3 method to ease visualization of
results. For instance, summary()
returns the statistical
information of the fit.
summary(fit_cardinal)
#> Warning in summary.modFit(object$fit_results): Cannot estimate covariance;
#> system is singular
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> mu_opt 1.112 NA NA NA
#> temperature_xmin 5.305 NA NA NA
#> temperature_xopt 35.685 NA NA NA
#> temperature_n 1.005 NA NA NA
#> pH_xmin 5.136 NA NA NA
#> pH_xopt 6.476 NA NA NA
#> pH_xmax 7.000 NA NA NA
#> pH_n 3.012 NA NA NA
#>
#> Residual standard error: 0.04752 on 56 degrees of freedom
#> Error in cov2cor(x$cov.unscaled): 'V' is not a square numeric matrix
In this case, the summary method shows NA
values for the
standard error of every model parameter. This is due to the poor
identifiability of the model (e.g., due to parameter correlation). For
this reason, it is often needed for this kind of model to fix some of
the model parameters. This can be done with the known_pars
argument, which is a numeric vector following the same conventions as
starting_point
.
The selection of parameters to fix is complex and is outside of the scope of this article. We recommend the interested reader checking the documentation of the FME package, as well as the scientific literature on the topic (e.g. Tsiantis et al. (2018); 10.1093/bioinformatics/bty139). For this example, we will consider that the growth rate under optimum conditions is known, as well as most of the the cardinal parameters for pH. Regarding temperature, we will only fix the order of the model.
For the remaining model parameters, we will asign values close to
those defined in the initial guess calculated by
make_guess_secondary()
.
Once these parameters have been defined, we can call again the fitting function.
Now, the summary
method returns the standard error of
the model parameters.
summary(fit_cardinal)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> temperature_xmin 5.31768 0.14965 35.53 <2e-16 ***
#> temperature_xopt 34.29640 0.76253 44.98 <2e-16 ***
#> pH_xopt 6.51109 0.01171 555.90 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.04026 on 61 degrees of freedom
#>
#> Parameter correlation:
#> temperature_xmin temperature_xopt pH_xopt
#> temperature_xmin 1.000e+00 -0.1911 -2.376e-09
#> temperature_xopt -1.911e-01 1.0000 -2.173e-01
#> pH_xopt -2.376e-09 -0.2173 1.000e+00
Besides summary
, the FitSecondaryGrowth
implements other S3 methods to inspect the fitting object. The print
method provides an overview of the fitted model.
print(fit_cardinal)
#> Secondary model estimated from data
#>
#> Environmental factors included: temperature, pH
#>
#> mu_opt: 1.2
#>
#> Secondary model for temperature:
#> xmin xopt n model
#> "5.31768272927254" "34.2963979942642" "1" "Zwietering"
#>
#> Secondary model for pH:
#> xmin xopt xmax n
#> "5.2" "6.51109154194514" "6.8" "2"
#> model
#> "CPM"
The package includes S3 methods to plot the results. By default, a plot comparing observed and predicted values is shown
In this plot, the dashed line is the line with intercept 0 and slope 1 where every point should fall in case of a perfect fit. The solid gray line is the regression line of predictions vs observations.
Alternatively, by passing the argument which=2
, one can
plot the observed and predicted counts as a function of the
environmental factors
A trend line can be added to this plot using the
add_trend=TRUE
argument:
plot(fit_cardinal, which = 2, add_trend = TRUE)
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Note that this line is not the predictions of the gamma model but a
trend line based on the observations or the predictions. Therefore, it
can be used to compare the tendency of the model predictions against the
one of the observations, but it should not be used for model predictions
(the predict
method should be used instead).
Besides these methods, it also includes other common S3 methods for
this type of object (residuals
, coef
,
vcov
, deviance
, fitted
,
predict
…). Please check the help page of
FitSecondaryGrowth
for a complete list of available
methods.
Note that this class is a subclass of list
, making it
easy to access different aspects of the fit directly. Namely, it has 5
items:
fit_results
the object returned by
modFit
.secondary_model
parameters of the secondary model.mu_opt_fit
estimated growth rate under optimal storage
conditions.data
data used for the fit.transformation
transformation applied to the growth
rate before fitting.