# How to perform bootstrap and jackknife analysis?

Bootstrap and jackknife are superficially similar statistical techniques that involve re-sampling the data. They are nonparametric and specific resampling techniques that can estimate standard errors and confidence intervals of a population parameter. The population parameters include mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. In 1979 Bradley Efron introduced the bootstrap method for evaluating the variance of an estimator. On the other hand Quenouille in 1949 introduced the jackknife method to estimate the bias of an estimator and evaluating the variance of an estimator. Bootstrap and jackknife are helpful in calculating an appropriate sample size for experimental design.

## Examples of bootstrap & jackknife

This section presents an example for the application of bootstrap and jackknife. Suppose that there are five data points:

5, 4, 8, 9, 7.

Resample the data points with replacement from original sample to create bootstrap samples. Each bootstrap sample will have a size of five, similar to the original sample. Since the data points are randomly selected, the bootstrap samples may be different from the original sample and from each other also.

The table below represents an example of 20 bootstrap samples:

4, 9, 8, 7, 4 =6.4 | 5, 8, 7,9, 4=6.6 | 9, 9,7, 4, 5=6.8 | 8, 8, 7, 4, 7=6.8 | 8, 5,7, 8, 5=6.6 |

5, 7,7 ,7, 8= 6.8 | 9, 8, 8, 8, 9=8.4 | 4, 9, 5, 7, 4=5.8 | 5, 4, 7, 4, 5=5 | 8, 4,5,9, 4=6 |

7, 4, 9, 8, 4=6.4 | 5, 9, 8, 4, 7=6.6 | 9, 9, 9, 4, 5=7.2 | 5, 5, 4, 4, 4= 4.4 | 4, 4, 5, 7, 5=5 |

5, 7,5, 5, 8=6 | 9, 7, 4, 8, 5=6.6 | 7, 8, 8, 5, 4= 6.4 | 8, 5, 4, 8, 7=6.4 | 5, 4, 5, 5, 5=4.8 |

Table 1 Creating bootstrap samples from the original sample

### Results

In this case, using bootstrap to calculate a confidence interval about the population mean. Calculate the means of each of the bootstrap samples. Thus, the mean values can be arranged in ascending order as: 4.4, 4.8, 5, 5, 5.8, 6, 6, 6.4, 6.4, 6.4, 6.4, 6.6, 6.6, 6.6, 6.6, 6.8, 7.2, and 8.4.

Next, calculate the confidence interval from the bootstrap sample means. Since 95% confidence interval is the most common, use the 100th and 5th percentiles as the endpoints of the intervals. This is because, split (100% – 95%) = 5% in order to have the middle 95% of all of the bootstrap sample means.

The confidence interval is at 4.8 to 6.8, in other words there is 95% confidence that all the sample means lies between 4.8 and 6.8.

## Case study using SPSS

In the current article bootstrapping is performed for two variables namely height and weight. Bootstrapping analysis can be applied using SPSS software. It works on a number of different analysis in SPSS. For this article bootstrapping is performed using Pearson correlation analysis. Bootstrapping and jackknife are more useful in cases where the data does not follow normal distribution.

**Analyze > descriptive summary**

This case uses the same dataset as in the logistic regression article. Results in Table 1 are the descriptive statistics without bootstrapping. The mean score of four different variables displaying standard error along with Skewness and Kurtosis values.

### Descriptive statistics

Now, the bootstrapping has performed for the descriptive analysis with the same data. 1000 different samples were used for bootstrapping with 95 % confidence interval.

### Analyze > Descriptive analysis > bootstrapping

The figure above shows the results from bootstrapping for interview scores. In this case the standard error of mean decreases from 1.67 to 1.58. This shows that bootstrapping was able to reduce the standard error which also reduces the bias in the dataset.

## Applications of bootstrap & jackknife

- Analysis of Null models, competition and community structure.
- Detection of density dependence.
- Characterizing spatial patterns and processes.
- To estimate population size and vital rates.
- Creating environmental modeling.
- For evolutionary processes and rates.
- To conduct phylogeny analysis.
- To calculate an appropriate sample size for experimental design.
- For calculating estimator that is a sample analogue of a parameter.
- To estimate the bias and standard error in a statistic, when a random sample of observations helps calculate it.

A number of available software support this analysis method, like R, SAS, S PLUS, RESAMPLING STATS MATLAB, STATA and SPSS.

Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

- Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
- Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
- Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).

## Discuss