How to perform bootstrap and jackknife analysis?

By Prateek Sharma & Priya Chetty on February 26, 2018

Bootstrap and jackknife are superficially similar statistical techniques that involve re-sampling the data. They are nonparametric and specific resampling techniques that can estimate standard errors and confidence intervals of a population parameter. The population parameters include mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. In 1979 Bradley Efron introduced the bootstrap method for evaluating the variance of an estimator. On the other hand Quenouille in 1949 introduced the jackknife method to estimate the bias of an estimator and evaluating the variance of an estimator. Bootstrap and jackknife are helpful in calculating an appropriate sample size for experimental design.

Examples of bootstrap & jackknife

This section presents an example for the application of bootstrap and jackknife. Suppose that there are five data points:

5, 4, 8, 9, 7.

Resample the data points with replacement from original sample to create bootstrap samples. Each bootstrap sample will have a size of five, similar to the original sample. Since the data points are randomly selected, the bootstrap samples may be different from the original sample and from each other also.

The table below represents an example of 20 bootstrap samples:

4, 9, 8, 7, 4 =6.4	5, 8, 7,9, 4=6.6	9, 9,7, 4, 5=6.8	8, 8, 7, 4, 7=6.8	8, 5,7, 8, 5=6.6
5, 7,7 ,7, 8= 6.8	9, 8, 8, 8, 9=8.4	4, 9, 5, 7, 4=5.8	5, 4, 7, 4, 5=5	8, 4,5,9, 4=6
7, 4, 9, 8, 4=6.4	5, 9, 8, 4, 7=6.6	9, 9, 9, 4, 5=7.2	5, 5, 4, 4, 4= 4.4	4, 4, 5, 7, 5=5
5, 7,5, 5, 8=6	9, 7, 4, 8, 5=6.6	7, 8, 8, 5, 4= 6.4	8, 5, 4, 8, 7=6.4	5, 4, 5, 5, 5=4.8

Table 1 Creating bootstrap samples from the original sample

Results

In this case, using bootstrap to calculate a confidence interval about the population mean. Calculate the means of each of the bootstrap samples. Thus, the mean values can be arranged in ascending order as: 4.4, 4.8, 5, 5, 5.8, 6, 6, 6.4, 6.4, 6.4, 6.4, 6.6, 6.6, 6.6, 6.6, 6.8, 7.2, and 8.4.

Next, calculate the confidence interval from the bootstrap sample means. Since 95% confidence interval is the most common, use the 100th and 5th percentiles as the endpoints of the intervals. This is because, split (100% – 95%) = 5% in order to have the middle 95% of all of the bootstrap sample means.

The confidence interval is at 4.8 to 6.8, in other words there is 95% confidence that all the sample means lies between 4.8 and 6.8.

Case study using SPSS

In the current article bootstrapping is performed for two variables namely height and weight. Bootstrapping analysis can be applied using SPSS software. It works on a number of different analysis in SPSS. For this article bootstrapping is performed using Pearson correlation analysis. Bootstrapping and jackknife are more useful in cases where the data does not follow normal distribution.

Analyze > descriptive summary

This case uses the same dataset as in the logistic regression article. Results in Table 1 are the descriptive statistics without bootstrapping. The mean score of four different variables displaying standard error along with Skewness and Kurtosis values.

Descriptive statistics

Now, the bootstrapping has performed for the descriptive analysis with the same data. 1000 different samples were used for bootstrapping with 95 % confidence interval.

Analyze > Descriptive analysis > bootstrapping

Figure 2: Results of bootstrapping using SPSS

The figure above shows the results from bootstrapping for interview scores. In this case the standard error of mean decreases from 1.67 to 1.58. This shows that bootstrapping was able to reduce the standard error which also reduces the bias in the dataset.

Applications of bootstrap & jackknife

Analysis of Null models, competition and community structure.
Detection of density dependence.
Characterizing spatial patterns and processes.
To estimate population size and vital rates.
Creating environmental modeling.
For evolutionary processes and rates.
To conduct phylogeny analysis.
To calculate an appropriate sample size for experimental design.
For calculating estimator that is a sample analogue of a parameter.
To estimate the bias and standard error in a statistic, when a random sample of observations helps calculate it.

A number of available software support this analysis method, like R, SAS, S PLUS, RESAMPLING STATS MATLAB, STATA and SPSS.

Priya Chetty

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here.

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.