Bootstrap and jackknife are superficially similar statistical techniques that involve re-sampling the data. They are nonparametric and specific resampling techniques that can estimate standard errors and confidence intervals of a population parameter. The population parameters include mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. In 1979 Bradley Efron introduced the bootstrap method for evaluating the variance of an estimator. On the other hand Quenouille in 1949 introduced the jackknife method to estimate the bias of an estimator and evaluating the variance of an estimator. Bootstrap and jackknife are helpful in calculating an appropriate sample size for experimental design.
Examples of bootstrap & jackknife
This section presents an example for the application of bootstrap and jackknife. Suppose that there are five data points:
5, 4, 8, 9, 7.
Resample the data points with replacement from original sample to create bootstrap samples. Each bootstrap sample will have a size of five, similar to the original sample. Since the data points are randomly selected, the bootstrap samples may be different from the original sample and from each other also.
The table below represents an example of 20 bootstrap samples:
|4, 9, 8, 7, 4 =6.4||5, 8, 7,9, 4=6.6||9, 9,7, 4, 5=6.8||8, 8, 7, 4, 7=6.8||8, 5,7, 8, 5=6.6|
|5, 7,7 ,7, 8= 6.8||9, 8, 8, 8, 9=8.4||4, 9, 5, 7, 4=5.8||5, 4, 7, 4, 5=5||8, 4,5,9, 4=6|
|7, 4, 9, 8, 4=6.4||5, 9, 8, 4, 7=6.6||9, 9, 9, 4, 5=7.2||5, 5, 4, 4, 4= 4.4||4, 4, 5, 7, 5=5|
|5, 7,5, 5, 8=6||9, 7, 4, 8, 5=6.6||7, 8, 8, 5, 4= 6.4||8, 5, 4, 8, 7=6.4||5, 4, 5, 5, 5=4.8|
Table 1 Creating bootstrap samples from the original sample
In this case, using bootstrap to calculate a confidence interval about the population mean. Calculate the means of each of the bootstrap samples. Thus, the mean values can be arranged in ascending order as: 4.4, 4.8, 5, 5, 5.8, 6, 6, 6.4, 6.4, 6.4, 6.4, 6.6, 6.6, 6.6, 6.6, 6.8, 7.2, and 8.4.
Next, calculate the confidence interval from the bootstrap sample means. Since 95% confidence interval is the most common, use the 100th and 5th percentiles as the endpoints of the intervals. This is because, split (100% – 95%) = 5% in order to have the middle 95% of all of the bootstrap sample means.
The confidence interval is at 4.8 to 6.8, in other words there is 95% confidence that all the sample means lies between 4.8 and 6.8.
Case study using SPSS
In the current article bootstrapping is performed for two variables namely height and weight. Bootstrapping analysis can be applied using SPSS software. It works on a number of different analysis in SPSS. For this article bootstrapping is performed using Pearson correlation analysis. Bootstrapping and jackknife are more useful in cases where the data does not follow normal distribution.
Analyze > descriptive summary
This case uses the same dataset as in the logistic regression article. Results in Table 1 are the descriptive statistics without bootstrapping. The mean score of four different variables displaying standard error along with Skewness and Kurtosis values.
Now, the bootstrapping has performed for the descriptive analysis with the same data. 1000 different samples were used for bootstrapping with 95 % confidence interval.
Analyze > Descriptive analysis > bootstrapping
The figure above shows the results from bootstrapping for interview scores. In this case the standard error of mean decreases from 1.67 to 1.58. This shows that bootstrapping was able to reduce the standard error which also reduces the bias in the dataset.
Applications of bootstrap & jackknife
- Analysis of Null models, competition and community structure.
- Detection of density dependence.
- Characterizing spatial patterns and processes.
- To estimate population size and vital rates.
- Creating environmental modeling.
- For evolutionary processes and rates.
- To conduct phylogeny analysis.
- To calculate an appropriate sample size for experimental design.
- For calculating estimator that is a sample analogue of a parameter.
- To estimate the bias and standard error in a statistic, when a random sample of observations helps calculate it.
A number of available software support this analysis method, like R, SAS, S PLUS, RESAMPLING STATS MATLAB, STATA and SPSS.
- How to use K-Nearest Neighbor (KNN) algorithm on a dataset? - July 16, 2018
- How to use an instrumental variable? - May 4, 2018
- How to perform LASSO regression test? - April 3, 2018