OBL: Optimum Block Length for Bootstrap Methods

library(OBL)

Introduction

ARIMA method has limitations in the area of small sample sizes among others. although, analysis of small sample series are available in few cases, there is currently no widely applicable and easily accessible method that can be used to make small sample inference. Methods like Edgeworth’s expansions involve a lot of algebra (which might discourage its users) and are also applicable in very special cases. The regular bootstrap method that could be a potential alternative failed on the grand of conflicting assumptions. The normal bootstrap method depends on assumption that observations are independent and identically distributed (i.i.d.), while a typical time series data are dependent in nature.

The Need for Block Bootstrap Methods

To find a way to ovoid this assumption of i.i.d. on normal bootstrap method in Efron (1979) and still maintain the dependence structure of time series data, one can hold reasonable amount of dependence structure within the series in a way by slicing a time series data into a number of chunks each with a length l. This way the dependence structure within each block is kept. Instead of sampling each unit randomly with replacement (as it would have been done for traditional bootstrap method), the chunks are rather sampled. This will distort certain amount of dependence structure of the series only among blocks are distorted (as serial correlation is distorted among the blocks), while the i.i.d. is invariably preserved. This way, one is able to coerce the i.i.d. assumption of the regular bootstrap method and the assuption of presence of serial correlation of a typical time series data in one method. The broad name given to method that achieves these two opposing objective is called Block Bootstrap Methods.

Statement of Problem

The main challenge with block bootstrap procedures is the responsiveness of Root Mean Squared Error (RMSE) to the preference of block length (l), or the number of blocks (m). This is one problem define in two ways, OBL: Optimum Block Lengt package has chosen to approached this problem with the preference of block length. Diverse methods can be used (which are explained briefly below), each method has numerous block lengths which must be considered, it is this problem that the OBL: Optimum Block Lengt package is here to solve.

About the Package

The OBL package provides optimum block length to five(5) different block bootrap methods vized:

  1. The Non-overlapping Block Bootstrap (NBB) uses a method described in Carlstein (1986) which splits original series into Non-overlapping blocks and thereafter resamples the blocks in multiple times(which is named R) to form a new series.

  2. The Moving Block Bootstrap (MBB) otherwise called Overlaping Block Bootstrap uses a method described in Kunsch (1989) which splits original series into overlapping blocks and thereafter resamples the blocks in multiple times(which is named R) to form a new series.

  3. The Circular Block Bootstrap (CBB) uses a method described in Politis and Romano (1992) is an improvement on MBB Kunsch (1989) such that in which provisions are made for observations at the tail end of the original series that could have been cut off from resampling simply because the left over element(s) is not equal to predetermined block length. This happens when original series is not divisible by \(n - l + 1\), where \(n\) is the number of original series and \(l\) is the predetermined block sizes \(1 < l < n\). Such provision is made up by completing the so called left-over by adding the first element(s) of the original series to form a circle. Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.

  4. The Tapered Moving Block Bootstrap (TMBB unpublished) is formed to reduce the less representative presence of extreme member of the series from \(2l\) to just 2.. Reduction of less-represented elements of the series will help to increase the performance of model evaluation metrics (RMSE and MAE). Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.

  5. The Tapered Circular Block Bootstrap (TCBB unpublished) is an extension from TMBB such that the last block contains the first element of the parent series as its last sub-series element. It is formed to reduce the less representative presence of extreme member of the series from \(2l\) (in the case of MBB) and from 2 (in the case of TMBB) to just 1. Reduction of less-represented element of the series will help to increase the performance of model evaluation metrics (which leads to reduced RMSE). Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.

It also checks for every possible block length l (where \(1 < l < n\) for \(n\) is the length of the original time series data) in each method to know which one is optimal by calculating RMSE value for every possible block length of each method and sorting out which of them is minimum in value. The minimum RMSE values for every method is sorted out in a data frame(with three(3) columns namely: Methods, lb and RMSE) to let the OBL: Optimum Block Lengt package users choose the method and the block length with the minimal RMSE value from the output data frame

Instalation

You can install the development version from GitHub with:

install.packages("devtools")
devtools::install_github("sta189332/OBL")

Description

It is observed that the optimum block length of any time series data is contingent(dependent) on the uniqueness of every time series data. Block bootstrap users thus, need to be flexible in choosing the optimum block length by adopting to a concise but clear while such method must be easy to use as well. As a result of the such a need, OBL: Optimum Block Lengt package is created to solve such problem. OBL: Optimum Block Length package helps users to search for the best block length and the best method that has the minimum RMSE value.
blockboot function produces a data frame with three (3) column (Method, lb & RMSE).

lolliblock function is another function that can plot the lollipop chart of the data frame displays by the blockboot function. It shows the optimum block lengths, for each method with different colours ranging from red to green. While red shows the method with worst performance (method with the highest RMSE) the green colour shows the method with the smallest RMSE. The corresponding block length of each methods as a legend with their matching colours.

Usage

The minimum arguments in the function blockboot() can be the ts which should be a univirate time series data and R which is the numbers of replicate of resapling.

blockboot(ts,
          R,
         seed,
         n_cores,
         methods = c("optnbb", "optmbb", "optcbb", "opttmbb", "opttcbb"))

While the minimum arguments in the function lolliblock() can be the ts which should be a univirate time series data and R which is the numbers of replicate of resapling.

lolliblock(ts,
          R,
         seed,
         n_cores,
         methods = c("optnbb", "optmbb", "optcbb", "opttmbb", "opttcbb"))

Argument

ts

R

seed

n_cores

Methods

univariate time series data

Number of replication for resampling

RNG seed

number of core(s) to be used on your operating system

methods is optional, if specified, it must be any combination as follows: “optnbb”, “optmbb”, “optcbb”, “opttmbb”, “opttcbb”

Output

The suction output a data frame with 5 rows 3 columns which are “Methods”, “lb” and “RMSE”. Method with the minimum RMSE value is

Examples

# simulate univariate time series data
set.seed(289805)
ts <- arima.sim(n = 10, model = list(ar = 0.8, order = c(1, 0, 0)), sd = 1) 
# get the optimal block length table
OBL::blockboot(ts = ts, R = 100, seed = 6, n_cores = 2)
#  Methods lb      RMSE
#1     nbb  9 0.2402482
#2     mbb  9 0.1023012
#3     cbb  8 0.2031448
#4    tmbb  4 0.2654746
#5    tcbb  9 0.4048711

The suction output a lollipop chart with 5 pops for the 5 methods separated with 5 distinct colours while the method with red lollipop indicates the least desired method with the highest RMSE and the method with green lollipop indicates the preferred method having the lowest RMSE. The legend beside the chart indicate the optimum block length for each method.

Examples2

# simulate univariate time series data
set.seed(289805)
ts <- arima.sim(n = 10, model = list(ar = 0.8, order = c(1, 0, 0)), sd = 1) 
# get the optimal block length table
OBL::lolliblock(ts = ts, R = 100, seed = 6, n_cores = 2)

vignette

vignette(“factors.cc”, package=“rQCC”)

Reference

Carlstein, Edward. 1986. “The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence.” The Annals of Statistics, 1171–79.
Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Ann. Statist. 7 (1): 1–26.
Kunsch, Hans R. 1989. “The Jackknife and the Bootstrap for General Stationary Observations.” The Annals of Statistics, 1217–41.
Politis, Dimitris N, and Joseph P Romano. 1992. “A Circular Block-Resampling Procedure for Stationary Data.” Exploring the Limits of Bootstrap, 263–70.