library(reslife)
#> Loading required package: pracma
#> Loading required package: flexsurv
#> Loading required package: survival
#> Loading required package: gsl
#>
#> Attaching package: 'gsl'
#> The following objects are masked from 'package:pracma':
#>
#> Ci, Si, erf, erfc, eta, expint_E1, expint_Ei, fact, hypot, psi,
#> sinc, zeta
This function exclusively uses flexsurvreg() objects as the source of the data. We will show how to use the function below. First, we’ll start off with a simple example and show how the mean produced is equal to the integration.
We run a flexsurvreg() with a single covariate for the ‘gengamma.orig’ distribution. We then run the reslifefsr() function, using the flexsurvreg() object. We have a life of 1, which is a scalar value at which we calculate the residual life, and specify that we only want the mean.
library(flexsurv)
fs1 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc,dist = "gengamma.orig")
r <- reslifefsr(obj = fs1, life = 1, type = 'mean')
r
#> [1] 7.194142
We can verify the accuracy of our derivation using the integrate function.
b <- exp(as.numeric(fs1$coefficients[1]))
a <- exp(as.numeric(fs1$coefficients[2]))
k <- exp(as.numeric(fs1$coefficients[3]))
# numerical integration
f <- function(x) {
return(pgengamma.orig(x, shape = b, scale = a, k = k, lower.tail = FALSE))
}
mx_interal <- integrate(f, 1, Inf)$value/pgengamma.orig(1, shape = b, scale = a, k = k, lower.tail = FALSE)
mx_interal
#> [1] 7.194142
As you can see, the mean values are the same.
What if the flexsurvreg() object has multiple covariates? We will see this below. We will use the ‘bc’ dataset contained within ‘flexsurv’ with an added covariate ‘age’.
data(bc)
newbc <- bc
newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE),sd = 5)
Generating a flexsurvreg object:
We can now use the reslifefsr() function to generate residual life values. There are 686 values, since the length of the input is 686. We are only going to show 10, in order to save space.
head(reslifefsr(obj = fsr2, life= 4), 10)
#> [1] 10.991160 7.445017 11.888460 8.855456 7.320557 5.101370 7.224315
#> [8] 8.983472 4.498345 10.005028
If we want to look at the median, we can specify that.
head(reslifefsr(obj = fsr2,life= 4, type = 'median'),10)
#> [1] 8.893677 5.931759 9.646690 7.106529 5.828371 3.995473 5.748458 7.213403
#> [9] 3.502050 8.067468
We can look at all 3 types now (mean, median, and percentile).
head(reslifefsr(obj = fsr2, life= 4, p = .8, type = 'all'),10)
#> mean median percentile
#> 1 10.991160 8.893677 17.531507
#> 2 7.445017 5.931759 11.945311
#> 3 11.888460 9.646690 18.939333
#> 4 8.855456 7.106529 14.172096
#> 5 7.320557 5.828371 11.748439
#> 6 5.101370 3.995473 8.225509
#> 7 7.224315 5.748458 11.596157
#> 8 8.983472 7.213403 14.373850
#> 9 4.498345 3.502050 7.263311
#> 10 10.005028 8.067468 15.981979
Notice how when ‘all’ is used, the output is a dataframe with the name of each value as the column names.
But what if we want to supply our own data and make predictions? Since the ‘flexsurv’ regression object generates parameters, we can add our own data points and get the residual life values for each.
Let’s start by generating some data. This data has the same two levels as ‘fsr2’ above (group, age). Note that the data is placed in a data frame. The new data must be inputted as a data frame for the function to work.
We run that through the reslifefsr() function and get a 3 row data frame
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#> mean median percentile
#> 1 12.300164 9.992529 12.53526
#> 2 31.905635 26.564496 32.84741
#> 3 7.674191 6.122261 7.74986
Now we can show off some features of the function, specifically how it handles different input data.
Lets change the order of the variables in the input data and rerun. It will show the exact same results as above.
newdata = data.frame(group, age)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#> mean median percentile
#> 1 12.300164 9.992529 12.53526
#> 2 31.905635 26.564496 32.84741
#> 3 7.674191 6.122261 7.74986
This is a handy feature of the function that can enhance the user experience. Another feature of this is the ability to handle data with extra columns. Let’s show that feature below.
extra = c(100, 100, 100)
newdata2 = data.frame(age, group, extra)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata2)
#> mean median percentile
#> 1 12.300164 9.992529 12.53526
#> 2 31.905635 26.564496 32.84741
#> 3 7.674191 6.122261 7.74986
The extra column ‘extra’ is ignored by the function.
Are there scenarios where the package would return an error? Yes, in fact, there are a few.
The first is if not enough data is provided, which is shown below.
newdata3 = data.frame(group)
reslifefsr(obj = fsr2, life = 4,p = .6, type = 'all', newdata= newdata3)
#> Warning in if (fsoutput$covdata$covnames != colnames(newdata)) {: the condition
#> has length > 1 and only the first element will be used
#> mean median percentile
#> 1 87.22487 73.51266 90.10934
#> 2 150.23904 127.02587 155.29444
#> 3 47.56366 39.84435 49.06227
Since only ‘group’ is provided, predictions cannot be generated. The function needs values for ‘age’ in order to make predictions.
Another scenario could happen with factor data. The levels for ‘group’ are ‘Good’, ‘Medium’, and ‘Poor’. But what if a level was inputted that isn’t one of those three? Let’s see.
group = c("Medium", 'Good', "Terrible")
newdata = data.frame(age, group)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#> Error in weibull_mlr(obj, life, p, type, newdata): Incorrect Level Entered
Since ‘Terrible’ is not a valid level, the function errors and returns a message notifying the user of this error.
It is also possible to plot the mean residual life for a set of values from a ‘flexsurv’ object.
We take the first line from newbc and create a blank vector of length 10. We then loop the function to create the MRLs for the first 10 integers, which gives us the expected survival time given the patient has survived y years.
We can then plot these values.
newdata4 = newbc[1,]
mrl = rep(0,10)
for (i in 1:10){
mrl[i] = reslifefsr(obj = fsr2,life= i,p = .6, type = 'mean', newdata= newdata4)
}
plot(x=c(1:10), y=mrl,type="l", xlab = 'Years Survived', ylab = 'Mean Residual Life', main = 'MRL over Time')
That is an overview of how to use the flexsurvreg() dependent function. Next, we will show how to get residual life values by inputting your own parameters.
This function works similarly to the reslifefsr() function, except it doesn’t rely on a flexsurvreg() input. Instead, the user inputs the name of the distribution and the parameters. The distribution must be one contained within ‘flexsurv’ (‘gamma’, ‘gengamma.orig’, ‘weibull’, ‘gompertz’, ‘exponential’, ‘lnorm’, ‘llogis’, ‘gengamma’, ‘genf’, ‘genf.orig’), and the names of the parameters must match those specified for each distribution.
We will show a demonstration below.
The inputs for the function are simple. Values is the first argument, which are the values for which the residual life will be calculated. Values is usually a vector or a seq function. The next argument is the distribution name, ‘weibull’. The last argument are the parameters, inputted as a vector. The names, shape and scale, are specified in the vector.
For the examples in this vignette, we will use the output from a flexsurvreg() using the bc dataset.
fsr3 = flexsurvreg(Surv(recyrs, censrec) ~1, data=newbc, dist = 'weibull')
fsr3$res
#> est L95% U95% se
#> shape 1.271519 1.153372 1.401770 0.06326773
#> scale 6.191377 5.604203 6.840071 0.31475730
We use a sequence of values from 1 to 10, counting up by .5, for this example.
residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191))
#> [1] 5.280618 5.125825 4.994025 4.878801 4.776287 4.683907 4.599837 4.522725
#> [9] 4.451537 4.385457 4.323832 4.266128 4.211904 4.160787 4.112464 4.066666
#> [17] 4.023161 3.981746 3.942246
What if the parameter names are incorrect? The function will return an error.
residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, not_scale = 6.191))
#> Error in residlife(values = seq(1, 10, 0.5), distribution = "weibull", : incorrect parameters entered. Parameters for weibull are shape and scale
For distributions where there are different parameterizations, the function can be flexible.
For example, ‘gamma’ has both shape/rate and shape/scale parameterizations. The function works for both, as long as the parameters are correctly specified in the input.
residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, scale = 6.191))
#> [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#> [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311
residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, rate = 1/6.191))
#> [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#> [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311
Much like the reslifefsr() function, the reslife() function has the ability to show mean, median, percentile, or all three.
residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191), p = .7, type = 'all')
#> mean median percentile
#> 1 5.280618 4.151524 6.619995
#> 2 5.125825 3.988283 6.423757
#> 3 4.994025 3.851217 6.253251
#> 4 4.878801 3.733262 6.102230
#> 5 4.776287 3.629997 5.966703
#> 6 4.683907 3.538406 5.843883
#> 7 4.599837 3.456318 5.731711
#> 8 4.522725 3.382114 5.628606
#> 9 4.451537 3.314545 5.533322
#> 10 4.385457 3.252635 5.444853
#> 11 4.323832 3.195598 5.362376
#> 12 4.266128 3.142799 5.285206
#> 13 4.211904 3.093714 5.212767
#> 14 4.160787 3.047908 5.144570
#> 15 4.112464 3.005015 5.080196
#> 16 4.066666 2.964725 5.019284
#> 17 4.023161 2.926773 4.961518
#> 18 3.981746 2.890929 4.906624
#> 19 3.942246 2.856997 4.854361
The outputs of this function can then easily be taken and used in a plot or any other type of analysis tool.
Let’s look at an example of such a plot.
Say we want to plot the mean residual life over time. First, we would create a vector for time, run it through the residlife() function with type = ‘mean’, and plot the output against time.
After extracting the parameters, we can run residlife() and generate an output vector. Note that since residlife() can take a vector input, we don’t have to use a loop.