Multiparental populations (MPPs) like the nested association mapping (NAM; McMullen et al. (2009)) or multi-parent advanced generation inter-cross (MAGIC; Cavanagh et al. (2008)) populations have emerged as central genetic resources for research and breeding purposes (Scott et al. 2020). The diallels (Blanc et al. 2006), the factorial designs or the collections of crosses developed in breeding programs (Würschum 2012) can also be considered as MPPs. We can classify MPPs given the use of intrecrossing. In a first type of MPPs, there is no intercrossing. Each genotype inherits its alleles from two parents. Those MPPs can be seen as a collection of crosses (e.g. NAM or factorial design). A second type of MPPs like MAGIC involve genotypes that are a mosaic of more than two parents or founders. In this package, we only consider the first type of MPPs without intercrossing or collections of crosses with shared parents.
Even though several statistical methodologies and software are available to perform QTL detection in MPPs, there is potentially few or no open-source solution for the detection of QTL in MPPs characterized in multiple environments (MPP-ME). In this extension of mppR, we propose a method to detect QTLs in MPP-ME data.
We start from the QTL detection model 3 described in (Garin, Malosetti, and Eeuwijk 2020):
\[\underline{y}_{icj} = E_{j} + C_{cj} + x_{i_{q}p} * \beta_{pj} + \underline{GE}_{icj} + \underline{e}_{icj} \quad [1]\]
where \(\underline{y}_{icj}\) is the genotype adjusted mean (e.g. BLUE) of genotype \(i\) from cross \(c\) in environment \(j\). \(E_{j}\) is a fixed environment term, \(C_{cj}\) a fixed cross within environment interaction, \(x_{i_{q}p}\) is the number of allele copies coming from parent \(p\) at QTL position \(q\), and \(\beta_{pj}\) represent the QTL effect of parental allele \(p\) in environment \(j\). Different definitions of the QTL allelic effect are possible (Garin et al. 2017). In this version of the model we assumed that each parent carries a different allele and that each of those alleles can have an environmental specific effect. The model is therefore saturated with the estimation of \(N_{env} * N_{par}\) effects with the most frequent allele set as the reference in all environments (e.g. the central or recurrent parent in NAM). In model 1, QTL terms are considered as fixed. Therefore, the user needs to provide best linear unbiased estimates (BLUEs) as trait value.
The term \(GE_{icj}\) is the residual genetic variation and \(e_{icj}\) the plot error term. In model 1 because genotype values are averaged over replication \(e_{icj}\) and \(GE_{icj}\) are confounded. The variance of the \((GE_{icj} + e_{icj})\) term can be modeled with different variance covariance (VCOV) structures. A first possibility is to assume a constant genotypic variance across environments \(\sigma_{G}^{2}\) (compound symmetry - CS) and a unique variance for the plot error term \(\sigma_{e}^{2}\):
\[V\begin{bmatrix} y_{i11} \\ y_{i'21} \\ y_{i12} \\ y_{i'22} \\ y_{i13} \\ y_{i'23} \end{bmatrix} = \begin{bmatrix} \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 \\ 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} \\ \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 \\ 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} \\ \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 \\ 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} \\ \end{bmatrix}\]
The CS model requires the estimation of a single term (\(\sigma_{G}^{2}\)) to describe genetic the covariance between all pairs of environments. This model assumes that the background polygenic effect (effect of all genome positions except the tested QTL and cofactors) is the same in all environments (Van Eeuwijk et al. 2001).
A more realistic option called unstructured (UN) model allows the genetic covariance to be different in every pairs of environments. From a genetic point of view, it means that a set of genes have a different effect in each environments (Van Eeuwijk et al. 2001). In the unstructured model, each pair of environment get a specific genetic covariance term \(cov(y..j, y..j') = \sigma_{G_{jj'}}^{2}\)
\[V\begin{bmatrix} y_{i11} \\ y_{i'21} \\ y_{i12} \\ y_{i'22} \\ y_{i13} \\ y_{i'23} \end{bmatrix} = \begin{bmatrix} \sigma_{G_{1}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{13}} & 0 \\ 0 & \sigma_{G_{1}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{13}} \\ \sigma_{G_{12}} & 0 & \sigma_{G_{2}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{23}} & 0 \\ 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{2}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{23}} \\ \sigma_{G_{13}} & 0 & \sigma_{G_{23}} & 0 & \sigma_{G_{3}}^{2} + \sigma_{e}^{2} & 0 \\ 0 & \sigma_{G_{13}} & 0 & \sigma_{G_{23}} & 0 & \sigma_{G_{3}}^{2} + \sigma_{e}^{2} \\ \end{bmatrix}\]
We considered the UN model as the default option.
The QTL detection procedure is composed of the following steps:
The function mppGE_proc()
perform the whole procedure
detailed above. For further details, see the examples below.
The calculation of an ‘exact’ mixed model at each marker (QTL) position of the genome is computationally demanding, which can quickly make detection unfeasible, especially with large populations. Therefore, to reduce the computation demand for a genome scan, we implemented an approximate mixed model computation similar to the generalized least square strategy implemented in Kruijer et al. (2015) or Garin et al. (2017). The procedure consists of estimating a general VCOV (\(\hat{V}\)) using model 1 without the tested QTL position for the simple interval mapping (SIM) scan. For the composite interval mapping (CIM) model, \(\hat{V}\) is estimated including the cofactors as fixed effect without the QTL position. For the CIM scan, there will be as many \(\hat{V}\) as the number of cofactors combinations given cofactor exclusion around the tested position.
There are two option for the estimation of \(\hat{V}\). In the first option we estimate
a unique \(\hat{V}\) taking all
cofactors into consideration (VCOV_data = "unique"
). In the
second option, different \(\hat{V}\)
should be formed by removing the cofactor information that is too close
of a tested QTL position (VCOV_data = "minus_cof"
).
The statistical significance of the tested QTL position and the allelic effects is obtained by substituting \(\hat{V}\) to get the following Wald statistic
\[W_Q = \beta^{T} [V(\beta)]^{-1} \beta \quad [2]\]
where,
\[\beta = (X^{T} \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} y \quad [3]\]
and
\[V(\beta) = (X^{T} \hat{V}^{-1} X)^{-1}\]
\(X\) represents the fixed effect matrix including the cross effect, the cofactors, and the QTL position, and \(y\) is the vector of genotypic values. \(W_Q \sim \chi_{d}^{2}\) with degree of freedom (\(d\)) equal to the number of tested QTL allelic effects (e.g. \((N_{par}-1) * N_{env}\) for a QTL term with environment-specific allelic effects).
It is possible to estimate the QTL parental allelic effect using
model 1. Model 1 allows the estimation of QTL allelic effect for each
parents in each environment. One parent is set as reference. By default,
it is the most frequent parental allele. Therefore, in populations like
the nested association mapping (NAM) populations, the central parent is
considered as the reference. The reference parent can be specified by
the user through the ref_par
argument. The parental QTL
allelic effect must be interpreted as the extra effect of one allele
copy with respect to the reference parent in the considered
environment.
After QTL detection, it is possible to determine if the QTL allelic effects interact with the environment or rather have a consistent effect across environments. For that purpose, the parental QTL effects can be decomposed into a main effect component (\(\alpha_p\)) and a QEI component (\(\beta_{pj}\)) that are estimated simultaneously, which gives the following multi-QTL model:
\[\underline{y}_{icj} = E_{j} + C_{cj} + \sum_{q=1}^{n_{QTL}} x_{i_{q}p} * (\alpha_p + \beta_{pj}) + \underline{GE}_{icj} + \underline{e}_{icj} \quad [4]\]
We can test for the significance of the \(\alpha_p\) and \(\beta_{pj}\) terms using the Wald test. Those tests allow the determination of the specific parental effect significance (\(\alpha_p\) or \(\beta_{pj}\) significant), and the determination of the specific parental allele interaction with the environment (\(\beta_{pj}\) significant).
For the parental allelic effects showing a significant interaction with the environment in model 4, it is possible to extend the model to test the sensitivity of the QTL allele to a specific environmental covariate (\(EC_j\), e.g. temperature or humidity). For that purpose, we can rewrite the QTL effect term \(\beta_{pj}\) of the parents with a significant environmental interaction (pxE). We can replace \(\beta_{pj}\) with \(EC_j*S_p+l_{p\epsilon}\). \(EC_j\) represents the EC value in environment j associated with the sensitivity term \(S_p\). The \(S_{p}\) determines the rate of change of the parental QTL allelic additive effect given an extra unit of EC. Finally, \(l_{p\epsilon}\) is a residual effect. The fitted model becomes:
\[\underline{y}_{icj} = E_{j} + C_{cj} +
\sum_{q=1}^{n_{QTL}} x_{i_{q}p} * (\alpha_p + \beta_{pj}) + x_{i_{q}pxE}
* (\alpha_p + EC_j*S_p+l_{p\epsilon}) + \underline{GE}_{icj} +
\underline{e}_{icj} \quad [5]\] The significance of the
sensitivity term \(S_p\) can be
evaluated with the Wald test. Contrary to the model fitted for the QTL
detection scan, model 1 for QTL effect estimation, models 4 and 5 are
fitted using an ‘exact’ REML solution using the R pakage
nlme
.
The format of the data is the same as the default
mppData
objects format used in mppR
. The
traits measured in the different environments are simply added as
separate columns of the pheno
argument when you create the
mppData
object with create.mppData()
.
The data used in this vignette (mppData_GE
) come from
the EU-NAM Flint population (Bauer et al. 2013). The genotype data
come from five crosses between the donor parents: D152, F03802, F2,
F283, DK105 with the recurrent parent UH007. We selected a subset of 100
markers spread over chromosomes five and six. The genetic map was
calculated by Giraud et al. (2014). The phenotypic data represent
the within environment adjusted means for dry matter yield (DMY)
calculated at La Coruna (CIAM), Roggenstein (TUM), Einbeck (KWS), and
Ploudaniel (INRA_P) Lehermeier et al. (2014).
library(mppR)
#> mppR has been developed with the support of
#> Wageningen University
#> KWS SAAT SE & Co. KGaA
#> Swiss National Science Foundation
#> (Grant Postdoc.Mobility P500PB_203030)
data(mppData_GE)
design_connectivity(par_per_cross = mppData_GE$par.per.cross)
#> $`1`
#> [1] "UH007" "D152" "F03802" "F2" "F283" "DK105"
To perform the SIM scan, you simply need to pass the
mppData
object to the function and specify the environments
you want to use in the trait
argument. If needed, a
reference parent can be specified using the ref_par
argument. For example, here we analyse DMY characterized in the CIAM and
TUM environments. After calculation, the SIM profile can be ploted with
plot()
The cofactors as well as the QTLs are selected using the same
function (QTL_select()
). It performs an iterative search
per chromosome starting from the most significant position above the
threshold
value. Then, marker positions falling on the left
and right of the selected position by a distance specified in
window
are excluded. The search continues with the
remaining positions by repeating the selection and exclusion steps. The
search stops when no position remains above the
threshold
.
The default value of window
(50) is deliberately large
to limit the number of included cofactors and not overfit the model. In
the mppGE_proc()
function, the cofactor threshold and
window parameters can be modified through the thre.cof
and
win.cof
arguments.
To perform the CIM scan, you simply need to pass the selected
cofators in the mppGE_CIM()
function. The
window
parameter specifies the distance on the left and
right where cofactors are excluded in the neighborhood of a tested
position. The VCOV_data
argument allow to specify if the
user want to estimate a unique general VCOV with all cofactors included
("unique"
) or several VCOV removing cofactor in the
neighbouring of a QTL position ("minus_cof"
). The unique
option is faster.
The selection of QTLs is done using the same function as the one used
for cofactors. For the selection of QTLs it is possible to restrict the
size of the exclusion window
around the selected positions.
In the mppGE_proc()
function, the QTL threshold and window
parameters can be modified through the thre.QTL
and
win.QTL
arguments.
The function QTL_effect_GE()
allows the estimation of
the QTL allele effects and their significances. It calculates an exact
version of mixed model 1 with single or multiple QTLs. The QTL allele
effects are estimated using [3] and their significance with the Wald
test [2].
The results of the QTL allelic effect estimation are returned in a list where each component represent the effects of one QTL. The individual within environment QTL effects are represented in row while the effects (\(\hat{\beta}\)), their standard error, as well as their significance are represented in columns. For example, looking at the first QTL, we can see that in the first environment the allele of parent DK105 decreases the DMY by -9.861 deciton/ha compared to the central parent UH007. In the second environment, the effect of DK105 is only -1.559. The effect is highly significant in the first environment but not in the second.
Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
QTL = QTL)
Qeff$QTL_1
#> Effect Std.dev Wald df p.val sign
#> QTL_1_UH007_E1 NA NA NA NA NA <NA>
#> QTL_1_D152_E1 0.979 3.107 0.10 1 0.7526802395
#> QTL_1_F03802_E1 -4.632 2.265 4.18 1 0.0408643496 *
#> QTL_1_F2_E1 -3.170 3.523 0.81 1 0.3683083123
#> QTL_1_F283_E1 -7.691 2.189 12.35 1 0.0004412536 ***
#> QTL_1_DK105_E1 -9.861 2.615 14.22 1 0.0001626113 ***
#> QTL_1_UH007_E2 NA NA NA NA NA <NA>
#> QTL_1_D152_E2 2.351 3.010 0.61 1 0.4346726625
#> QTL_1_F03802_E2 1.921 2.173 0.78 1 0.3764976065
#> QTL_1_F2_E2 -3.200 3.386 0.89 1 0.3445433645
#> QTL_1_F283_E2 -0.561 2.110 0.07 1 0.7904495664
#> QTL_1_DK105_E2 -1.559 2.513 0.38 1 0.5350108782
A specific parent can be selected as reference using the
ref_par
argument. For example we can set as reference the
parent ‘F2’ instead of the central parent (UH007).
Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
QTL = QTL, ref_par = 'F2')
Qeff$QTL_1
#> Effect Std.dev Wald df p.val sign
#> QTL_1_UH007_E1 3.170 3.523 0.81 1 0.3683083
#> QTL_1_D152_E1 4.149 4.697 0.78 1 0.3771534
#> QTL_1_F03802_E1 -1.462 4.188 0.12 1 0.7269845
#> QTL_1_F2_E1 NA NA NA NA NA <NA>
#> QTL_1_F283_E1 -4.522 4.148 1.19 1 0.2756179
#> QTL_1_DK105_E1 -6.691 4.387 2.33 1 0.1272424
#> QTL_1_UH007_E2 3.200 3.386 0.89 1 0.3445434
#> QTL_1_D152_E2 5.552 4.530 1.50 1 0.2203929
#> QTL_1_F03802_E2 5.122 4.023 1.62 1 0.2029702
#> QTL_1_F2_E2 NA NA NA NA NA <NA>
#> QTL_1_F283_E2 2.640 3.989 0.44 1 0.5081781
#> QTL_1_DK105_E2 1.641 4.217 0.15 1 0.6970780
The function QTL_R2_GE()
allows the estimation of an
\(R^{2}\) contribution of all the QTLs
(global) and of each QTL positions (partial). For simplicity, the \(R^{2}\) is estimated using a linear model
version of [1] and not a mixed model, which means that the \(GE_{icj}\) term is absent from the model
and that only a unique variance error term \(\sigma_{e}^{2}\) is estimated. Therefore,
the \(R^{2}\) values represent general
tendencies and should be taken with caution. The global adjusted \(R^{2}\) is defined as
\[R^{2} = 100*(1-(\frac{RSS_{full}/d_{full}}{RSS_{red}/d_{red}}))\]
where \(RSS_{full}\) and \(RSS_{red}\) are the residual sum of squares of the model with and without the QTL positions. \(d_{full}\) and \(d_{red}\) represent the degrees of freedom of the corresponding models. The partial (adjusted) \(R^{2}\) of QTL \(i\) is the difference between the global \(R^{2}\) obtained with all QTLs and the \(R^{2}\) obtained with a model minus QTL \(i\).
The profile plot of the QTL significance can be obtained using the
plot.QTLprof()
function that plots the \(-log_{10}(p-val)\) of the position along
the genome.
It is also possible to visualize the significance of the QTL parental
allele along the genome by passing the QTL profile information
(CIM
) to the plot_Qeff_prof()
function. The
QTL
argument can be used to specify potential QTL
positions.
The color intensity \(z = -log10(p-val)\) is proportional to the statistical significance of the parental allelic effects obtained from the Wald test [2]. \(z\) has an upper limit of six to not let extreme signficant value influence too much the color scale. Compared to the reference parent (top of the graph) the red (blue) colour represents a negative (positive) effect. The different panels from the top to the bottom represent the different environments.
It is possible to calculate the whole procedure described using the
mppGE_proc()
function. This function uses the same
arguments as the one defined in the sub-functions
(e.g. mppData
, trait
, thre.cof
,
win.cof
, window
, ref_par
etc.).
In this function, it is possible to specify a working directory in
output.loc
. A folder with the pop.name
and
trait.name
will be created to store intermediate data
output like the SIM and CIM profiles, some results (list of QTLs,
plots), as well as a summary of the results (QTL_REPORT.txt). Here we
can still introduce the argument n.cores
which allows to
run the function in parallel.
Once a list of QTL has been determined using the genome scan
functions mppGE_SIM()
, mppGE_CIM()
or
mppGE_proc()
, it is possible to refine our understanding of
the QTL parental allelic effects by determining which of those effects
have a significant interaction with the environment. To do that, it is
possible to use the function QTL_effect_main_QEI()
, which
calculates model 4. For the illustration, we extended the number of
environments by adding the INRA and KWS locations.
Qeff <- QTL_effect_main_QEI(mppData = mppData_GE,
trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'),
env_id = c('CIAM', 'TUM', 'INRA', 'KWS'),
QTL = QTL)
The function return two lists of \(N_{QTL}\) length. The first list
(Q_sign
) describes the significance of the main (\(\alpha_p\)) and QTLxE (\(\beta_{pj}\)) term per parent for each QTL.
For example, at QTL 2, we can notice that the ‘F2’, ‘F283’ and the
‘F03802’ parents have a significant interaction with the environment
because the \(-log10(p-val)\) of the
\(\beta_{pj}\) is larger than 1.301,
which correspond to p-value of 0.05.
Qeff$Q_sign$QTL2
#> par logP_main logP_QxE
#> 1 UH007 NA NA
#> 2 F2 1.05793090 3.1420765
#> 3 D152 0.96910440 0.2871339
#> 4 DK105 0.05370229 0.2525703
#> 5 F283 0.36207766 3.9077148
#> 6 F03802 0.01518149 4.6439537
The main and the QTLxE effect estimates are stored in the
Q_eff
list. For example for QTL 2, we can notice that the
last environment ‘KWS’ was automatically set as the reference.
Qeff$Q_eff$QTL2
#> Effect Std.dev Wald df p.val sign
#> QTL2_main_UH007 NA NA NA NA NA <NA>
#> QTL2_main_D152 -3.193 1.983 2.59 1 1.073731e-01
#> QTL2_main_F03802 0.062 1.438 0.00 1 9.656472e-01
#> QTL2_main_F2 -3.879 2.270 2.92 1 8.751230e-02 .
#> QTL2_main_F283 -1.129 1.444 0.61 1 4.344325e-01
#> QTL2_main_DK105 0.248 1.692 0.02 1 8.836855e-01
#> QTL2_CIAM_UH007 NA NA NA NA NA <NA>
#> QTL2_CIAM_D152 -2.391 2.762 0.75 1 3.866235e-01
#> QTL2_CIAM_F03802 -0.363 2.011 0.03 1 8.567614e-01
#> QTL2_CIAM_F2 -0.617 3.148 0.04 1 8.445821e-01
#> QTL2_CIAM_F283 -0.122 2.008 0.00 1 9.516950e-01
#> QTL2_CIAM_DK105 0.859 2.353 0.13 1 7.150577e-01
#> QTL2_TUM_UH007 NA NA NA NA NA <NA>
#> QTL2_TUM_D152 -3.664 2.609 1.97 1 1.601385e-01
#> QTL2_TUM_F03802 -7.902 1.871 17.84 1 2.407956e-05 ***
#> QTL2_TUM_F2 -4.382 2.942 2.22 1 1.363365e-01
#> QTL2_TUM_F283 -7.312 1.879 15.14 1 9.969571e-05 ***
#> QTL2_TUM_DK105 -2.741 2.198 1.55 1 2.124397e-01
#> QTL2_INRA_UH007 NA NA NA NA NA <NA>
#> QTL2_INRA_D152 -0.708 2.051 0.12 1 7.299216e-01
#> QTL2_INRA_F03802 1.853 1.488 1.55 1 2.130998e-01
#> QTL2_INRA_F2 -2.172 2.347 0.86 1 3.547953e-01
#> QTL2_INRA_F283 1.693 1.496 1.28 1 2.576782e-01
#> QTL2_INRA_DK105 -0.137 1.750 0.01 1 9.375956e-01
#> QTL2_KWS_UH007 NA NA NA NA NA <NA>
#> QTL2_KWS_D152 NA NA NA NA NA <NA>
#> QTL2_KWS_F03802 NA NA NA NA NA <NA>
#> QTL2_KWS_F2 NA NA NA NA NA <NA>
#> QTL2_KWS_F283 NA NA NA NA NA <NA>
#> QTL2_KWS_DK105 NA NA NA NA NA <NA>
Once the QTL alleles with a significant interaction with the
environment have been determined, it is possible to investigate if those
alleles are sensitive to specific environmental covariates using the
function QTL_effect_main_QxEC()
. This function estimates
model 5. For that purpose, you can pass the results obtained with
QTL_effect_main_QEI()
to the Qmain_QxE
argument from QTL_effect_main_QxEC()
. You also need to
provide the environmental covariate (EC) information in the format of a
numeric matrix with environment in row and EC in column
# provide the environmental covariate information as a matrix
EC <- matrix(c(180, 310, 240, 280), 4, 1)
rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS')
colnames(EC) <- 'cum_rain'
Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE,
trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'),
env_id = c('CIAM', 'TUM', 'INRA', 'KWS'),
QTL = QTL, EC = EC, Qmain_QEI = Qeff, thre_QTL = 1.301)
The function extends the QTLxE term for the parental allelic effects
with a \(-log10(p-value)\) superior to
the thre_QTL
argument. It returns a list with the second
element ‘Qeff_EC’ being a list with the \(\alpha_p\) (all parental alleles) and \(S_p\) (only parental allele with
significant QTLxE) estimates. For example, with a \(-log10(p-value)\) for the \(S_p\) term equal to 2.26 and 2.08, the
‘F03802’ and the ‘F283’ parental allele interact significantly with the
cumulated rain.
Qeff$Qeff_EC$QTL2
#> B_main logP_main B_cum_rain logP_cum_rain
#> UH007 NA NA NA NA
#> D152 -3.1931542 0.96792428 NA NA
#> F03802 12.0348838 2.31797894 -0.044433097 2.2647914
#> F2 -2.9970562 0.18508865 -0.008060809 0.1266095
#> F283 10.2417600 1.78749085 -0.042232587 2.0862106
#> DK105 0.1792433 0.03829626 NA NA
It is possible to estimate the effect of several ECs but user should be careful to have enough degrees of freedom (df) to estimate those sensitivity curves. For example, with four environments, you should only estimate the effect of one EC at a time because you only have two remaining degrees of freedom (4 env - 1 df main effect - 1 df sensitivity slope).
After calculation, it is possible to plot the sensitivity slopes
estimated with the function QTL_effect_main_QxEC()
. The
slope represent the rate of change in the parental allelic effect given
change in EC with respect to the reference parent.
We tested the proposed procedure on several sorghum BCNAM populations as well as maize NAM population and breeding MPPs. The table below show the computation time required for different configurations. Computations were realized on an AMD Ryzen 7-cores 5800 U processor with 16 GB RAM.
Populations | Ngeno | Nmarker | Nenv | Ncore | SIM [m] | CIM [h] | QTL_effects [m] | Total [h] |
---|---|---|---|---|---|---|---|---|
BCNAM Grinkan | 1598 | 51545 | 4 | 4 | 20.0 | 11.00 | 15.0 | 11.60 |
BCNAM Kenin-Keni | 575 | 51545 | 4 | 4 | 1.5 | 0.17 | 1.5 | 0.25 |
BCNAM Lata | 896 | 51545 | 3 | 4 | 3.0 | 0.33 | 2.0 | 0.40 |
EUNAM Dent | 841 | 18621 | 4 | 1 | 18.0 | 9.50 | 19.0 | 10.20 |
EUNAM Flint | 811 | 5949 | 6 | 1 | 15.0 | 16.00 | 150.0 | 19.00 |
Breeding pop1 | 2071 | 1812 | 3 | 1 | 5.0 | 1.30 | 9.0 | 1.50 |
Breeding pop2 | 820 | 1760 | 5 | 1 | 9.0 | 10.20 | 33.0 | 10.80 |
From a general point of view, even though we tried to reduce as much
as possible the computation time, it can still take several hours to
perform a complete analysis. The computation time depends principally on
the population size (number of genotypes), the number of environments,
as well as the number of markers. The use of multiple cores
(n.cores
argument) is a first possibility to reduce the
computational time.
The computation of the CIM part including cofactors is the most
demanding. The whole procedure time can be largely reduced if the user
only perform a SIM (SIM_only = TRUE
in
mppGE_proc()
). Generally, the large effect QTLs are the
same or fall in similar region in the SIM and CIM analysis. The number
of cofactor position selected will also considerably extend the
computational time. For that reason we advise to restrict their number
by selecting maximum one cofactor per chrosome.
Ultimately, even if the computation time can be long we should always put that in balance with the time required to develop the population, phenotype it and get the genotypic data.