Descriptive analysis terminologies in statistics

By Riya Jain & Priya Chetty on August 23, 2021

Descriptive analysis is a popular method of examining a dataset’s characteristics. Statistics involves many symbols, letters, equations and abbreviations. They represent different terms and values. For instance, ‘mean’ is one of the simplest and most basic mathematical measurements. It represents the average value of different populations in a sample dataset. In some cases, it is represented by the symbol ‘μ’. They are used to investigate patterns, trends, or relationships between items in a dataset.

Likewise, there are many other statistical terms and symbols which frequently occur in statistical analysis. In this article, we review some of the most basic terms in the following statistical tests:

Descriptive analysis
Correlation
Regression

Basic terms of statistics

The statistical analysis starts with the creation of a ‘dataset’ (Cambridge, 2021). It refers to the collection of numerical data in tabular form. There are different rows and columns in a datasheet carrying data. The image below represents a sample dataset.

The process of statistical analysis begins with the collection of a dataset like the one shown above. Every dataset will have two terms in common:

Number of observations (n): The total sample size of the dataset. For instance, in the Figure 1, above the sample size is 5, as there are 5 respondents. This sample size is technically called as ‘observations’.
Margin of error (e): The possibility of error or biasness in the derivation of the sample size or the collection of the dataset (Thompson and Liu, 2005). Most studies aim to keep a low margin of error. For example; a margin of error with a value of 0.03 represents a 3% error possibility.

In quantitative research after collecting the data, the first step is to summarise its findings in the form of graphs, charts or tables. It provides information on the average values or the relationship between variables in the dataset. Descriptive analysis can be performed in three ways (Hayes and Smith, 2021).

Frequency distribution in descriptive analysis

In the descriptive analysis, the frequency distribution is simply stating the number of times a response occurs in a dataset. For example, from figure 1 above, ‘male’ appears 4 times whereas ‘female’ appears 1 time. Therefore the frequency distribution of that dataset can be shown as below:

Gender	Frequency
Female	1
Male	4

Table 1: Sample for frequency distribution

Central tendency in descriptive analysis

The calculation of the average ‘central value’ of a dataset. A central value is such that it ideally represents all values in that dataset. There are three ways to determine the central value in descriptive analysis: mean, median, and mode (University of Utah, 2021).

Mean: The mean in central tendency is the most commonly used method. It is identified as the average value of the dataset i.e. the sum of all observations divided by the number of observations.
Median: The exact middle observation in the entire dataset. For this, the observations are arranged in either descending or ascending order. If the number of observations is odd then [(n+1)/2]th observation is median while for the even number [(n/2]th +(N/2 +1)th]/2 is the median value.
Mode: The most frequent observation in the entire dataset. It is determined by computing the number of times a value is appearing. A dataset could have no mode, one mode, or many modes.

For example; consider a sample dataset of 5 people’s age (n=5). Its descriptive analysis of mean, median and mode can be calculated as shown in the table below.

Observations (age of people in years)	10, 3, 2, 10, 5
Mean	Sum of all values = 10 + 3 + 2 + 10 + 5 = 30	Number of observations = 5	Mean = (30/5) = 6
Median	Aescending order = 2,3,5,10,10	Number is odd i.e. 5	Median = (5+1)/2th = 3^rd term = 5
Mode	Ordered dataset – 2, 3, 5, 10, 10	Only 10 has more than 1 observation	Mode = 10

Table 2: Measures of central tendency in descriptive analysis

Variance in descriptive analysis

The last method of descriptive analysis is variance. It determines the variation or the spread of the dataset i.e. how far a value is from other values. This spread of dataset could be measured mainly in three forms i.e. variance, standard deviation, and range (University of Utah, 2021).

Range: It determines the distance or how far the largest value in the set of observations of the dataset is from the smallest value. It is measured by subtracting the smallest value from the largest value.
Standard deviation: It measures the variability in the dataset i.e. how far a value is from the average value of the dataset. The presence of a higher standard deviation means the value is far from its mean value and thus not symmetric. It is calculated by first computing the mean value (M) and then determining the difference between each observation (x_i) and the mean value (x_i – M). Lastly, the square root of squared deviation divided by the number of observations (n) minus 1 is computed. The formula for standard deviation is:

Variance: The squared value of the standard deviation i.e. s² . The formula for variance is:

For example; consider the same dataset of 5 people’s ages (n=5). Its range, standard deviation and variance is calculated in the table below.

Table 3: Example of variance in descriptive analysis

Observations (people’s age in years)	10, 3, 2, 10, 5
Range	Smallest = 2 Largest = 10		Range = 10-2 = 8
Standard deviation	Deviations = (10-6, 3-6, 2-6, 10-6, 5-6)	Squared deviations = (16, 9, 16, 16, 1) = 58
Variance

References

Cambridge (2021) Dataset, Cambridge Dictionary. Available at: https://dictionary.cambridge.org/dictionary/english/dataset (Accessed: 21 August 2021).
Hayes, A. and Smith, A. (2021) What Are Descriptive Statistics?, Investopedia.
Thompson, P. and Liu, Y. (2005) ‘UNDERSTANDINGS OF MARGIN OF ERROR’, in Proceedings of the Twenty-seventh Annual Meeting of the International Group for the Psychology of Mathematics Education. Raonake: Virginia Tech.
University of Utah (2021) Central Tendency & Variability, University of Utah.

Priya Chetty
Riya Jain

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here.

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.

I am a Senior Analyst at Project Guru, a research and analytics firm based in Gurugram since 2012. I hold a master’s degree in economics from Amity University (2019). Over 4 years, I have worked on worked on various research projects using a range of research tools like SPSS, STATA, VOSViewer, Python, EVIEWS, and NVIVO. My core strength lies in data analysis related to Economics, Accounting, and Financial Management fields.

Basic terms of statistics

Frequency distribution in descriptive analysis

Central tendency in descriptive analysis

Variance in descriptive analysis

References

Discuss

proofreading