# Basic terms of statistics

“Statistical Package for the Social Sciences” is abbreviated as SPSS and was first developed by Norman H. Nie, Dale H. Bent and C. Hadlai Hull in the year 1968. The software is used to conduct statistical analysis on the sample of a larger population. Some of the terms used in statistics for sample analysis in this software are explained below:

## The mean (average)

Mean is the average of all values in a particular column. It is generally represented by µ (Babraham Bioinformatics, n.d.).

**µ= ∑(X)/N**

(Where, ∑= sum of, X= Individual data points, and N= Sample size)

Also known as ‘average’, it is the most common statistical tool by which the researchers can calculate the population mean when conducting quantitative analysis of the sample. Since mean is influenced by outliers (very small or very large number of values) therefore cannot be a fair representation of the data. For example, we calculate the mean for English Marks of Students in a sample of 100 respondents. If most of the students have earned nearly 60% marks then the average mean would be influenced by both high graders (more than 80%) and low graders (i.e. less than 20%) which would affect the overall mean value.

## The median

Median is a numerical value that separates the higher half of the sample from the lower half. In layman terms, it is the middle value in the sample or probability distribution (Weisstein, n.d.). Median is generally applied in situations where the researcher cannot get a proper measurement and therefore ranks the data in order. For example, when ranking students’ performance in class, the middle class in the class would represent the median performance of the class.

## The variance

Variance is the measure of how far the numbers in the sample are set out. For example, if the variance in the sample is zero then that indicates that the values are identical. For illustration, a sample of 100 respondents was asked about the chocolate brand they like the most. So, if all the respondents chose the same brand there is no variance, however, if some respondents chose other brands as well then it represents variance. A small variance indicates that the values are close to the mean (Weisstein, n.d.).

**Variance (S ^{2}): µ/N-1**

When we see variability in sample data, there would be reasons behind this Variability. For example; when a sample population is asked why they prefer to going to a particular store of shopping, they would either give variable reasons or similar reasons. This would enable the researcher to present findings related to that sample, based on which they can suggest recommendations to stores that have low sales. Therefore, to analyse variability, research needs to find out if something important has happened. A variance allows the researcher to answer these questions.

## The standard deviation

The square root of the variance is called standard deviation. It is denoted by SD (Weisstein, n.d.). It measures the variability in the sample and determines the relationship of the mean to the rest of the data. If the responses given by the sample are close to the mean, then it reflects uniformity in the data, and therefore the value for standard deviation would be small, and similarly, vice versa is also true. In case all the values are the same then the Standard Deviation value will be zero. The standard deviation can be calculated using the following formula.

S= Standard Deviation, ∑= Sum of, X= Each value in data set, X, mean of all values in data set and n= number of samples in the data set.

Standard deviation is also used to compare two sets of data effectively, for example, data set 1 includes; 1, 3, 5 and data set 2 includes; 0, 3, 6. Now the average mean of the two data sets is the same (3) however the standard deviation is different(1) = 2 and s(2) = 3. Without standard deviation, the researcher cannot handle data that is close to average.

## Confidence interval

The confidence interval quantifies the uncertainty in the measurement. When the mean of a sample is calculated, it may not represent the true population mean and the discrepancy in the value is dependent on the variability (change in responses among the respondents) of the values and the sample size (denoted as N, is a part of larger population). Therefore one has to combine these two to calculate a 95% or 98% confidence interval. At 95% confidence interval, reflects that the risk of being wrong is 5% and in the case of 98% confidence interval, reflects that the risk of being wrong is 2%. This interval would represent the true population mean (Weisstein, n.d.).

## The mode

The mode value is the value in distribution with the highest frequency. It can be calculated using nominal data (Argyrous, 2013). Mode is normally used for categorical data wherein the researcher wants to find out about the most common category.

## Range

The Range of a set of data is the difference between the largest and the smallest values (Woodbury, 2001). It is also used to define the variability in a sample or population. It helps in understanding how well-meaning can represent the data. So, if the studied variable has a critical low or high threshold then that threshold should not be crossed. For example, if the researcher is experimenting to study the average mileage of 10 cars, the maximum value would be the highest mileage and the lowest value would be the lowest mileage.

## Interquartile range (IQR)

The interquartile range is the measure of statistical variation and is equal to the difference between the upper and the lower quartile (which divides the sample into 4 equal parts) (Upton and Cook, 1996). The first or Lower quartile (Q1) is the middle number between the first and median and the third or Upper quartile (Q3) is the middle number between the last value and the median. The median is also called the Second Quartile (Q_{2}).

It is similar to the range and the only difference is that it applies to a large amount of data. It is also significant because it is not sensitive to outliers which affect other measures discussed above.

**IQR: Q _{3 }– Q_{1}**

**References**

- Argyrous G. (2011) [Online] Statistics for Research 3
^{rd} - Babraham Bioinformatics (n.d.) [Online] Introduction to Statistics with SPSS, Available at: http://www.bioinformatics.babraham.ac.uk/training/SPSS%20Course%20Manual.pdf.
- Upton G. and Cook I (1996). Understanding Statistics. Oxford University Press. P 55.
- Weisstein, Eric W. Statistical Median. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/StatisticalMedian.html.
- Woodbury G. (2001) An introduction to Statistics. Cengage Learning p. 74.

Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

- Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
- Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
- Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).

## Discuss