Descriptive analysis is a popular method of examining a dataset’s characteristics. Statistics involves many symbols, letters, equations and abbreviations. They represent different terms and values. For instance, ‘mean’ is one of the simplest and most basic mathematical measurements. It represents the average value of different populations in a sample dataset. In some cases, it is represented by the symbol ‘μ’. They are used to investigate patterns, trends, or relationships between items in a dataset.
Likewise, there are many other statistical terms and symbols which frequently occur in statistical analysis. In this article, we review some of the most basic terms in the following statistical tests:
- Descriptive analysis
Basic terms of statistics
The statistical analysis starts with the creation of a ‘dataset’ (Cambridge, 2021). It refers to the collection of numerical data in tabular form. There are different rows and columns in a datasheet carrying data. The image below represents a sample dataset.
The process of statistical analysis begins with the collection of a dataset like the one shown above. Every dataset will have two terms in common:
- Number of observations (n): The total sample size of the dataset. For instance, in the Figure 1, above the sample size is 5, as there are 5 respondents. This sample size is technically called as ‘observations’.
- Margin of error (e): The possibility of error or biasness in the derivation of the sample size or the collection of the dataset (Thompson and Liu, 2005). Most studies aim to keep a low margin of error. For example; a margin of error with a value of 0.03 represents a 3% error possibility.
In quantitative research after collecting the data, the first step is to summarise its findings in the form of graphs, charts or tables. It provides information on the average values or the relationship between variables in the dataset. Descriptive analysis can be performed in three ways (Hayes and Smith, 2021).
Frequency distribution in descriptive analysis
In the descriptive analysis, the frequency distribution is simply stating the number of times a response occurs in a dataset. For example, from figure 1 above, ‘male’ appears 4 times whereas ‘female’ appears 1 time. Therefore the frequency distribution of that dataset can be shown as below:
Central tendency in descriptive analysis
The calculation of the average ‘central value’ of a dataset. A central value is such that it ideally represents all values in that dataset. There are three ways to determine the central value in descriptive analysis: mean, median, and mode (University of Utah, 2021).
- Mean: The mean in central tendency is the most commonly used method. It is identified as the average value of the dataset i.e. the sum of all observations divided by the number of observations.
- Median: The exact middle observation in the entire dataset. For this, the observations are arranged in either descending or ascending order. If the number of observations is odd then [(n+1)/2]th observation is median while for the even number [(n/2]th +(N/2 +1)th]/2 is the median value.
- Mode: The most frequent observation in the entire dataset. It is determined by computing the number of times a value is appearing. A dataset could have no mode, one mode, or many modes.
For example; consider a sample dataset of 5 people’s age (n=5). Its descriptive analysis of mean, median and mode can be calculated as shown in the table below.
|Observations (age of people in years)||10, 3, 2, 10, 5|
|Mean||Sum of all values = 10 + 3 + 2 + 10 + 5 = 30||Number of observations = 5||Mean = (30/5) = 6|
|Median||Aescending order = 2,3,5,10,10||Number is odd i.e. 5||Median = (5+1)/2th = 3rd term = 5|
|Mode||Ordered dataset – 2, 3, 5, 10, 10||Only 10 has more than 1 observation||Mode = 10|
Variance in descriptive analysis
The last method of descriptive analysis is variance. It determines the variation or the spread of the dataset i.e. how far a value is from other values. This spread of dataset could be measured mainly in three forms i.e. variance, standard deviation, and range (University of Utah, 2021).
- Range: It determines the distance or how far the largest value in the set of observations of the dataset is from the smallest value. It is measured by subtracting the smallest value from the largest value.
- Standard deviation: It measures the variability in the dataset i.e. how far a value is from the average value of the dataset. The presence of a higher standard deviation means the value is far from its mean value and thus not symmetric. It is calculated by first computing the mean value (M) and then determining the difference between each observation (xi) and the mean value (xi – M). Lastly, the square root of squared deviation divided by the number of observations (n) minus 1 is computed. The formula for standard deviation is:
- Variance: The squared value of the standard deviation i.e. s2 . The formula for variance is:
For example; consider the same dataset of 5 people’s ages (n=5). Its range, standard deviation and variance is calculated in the table below.
|Observations (people’s age in years)||10, 3, 2, 10, 5|
|Range||Smallest = 2 Largest = 10||Range = 10-2 = 8|
|Standard deviation||Deviations = (10-6, 3-6, 2-6, 10-6, 5-6)||Squared deviations = (16, 9, 16, 16, 1) = 58|
- Cambridge (2021) Dataset, Cambridge Dictionary. Available at: https://dictionary.cambridge.org/dictionary/english/dataset (Accessed: 21 August 2021).
- Hayes, A. and Smith, A. (2021) What Are Descriptive Statistics?, Investopedia.
- Thompson, P. and Liu, Y. (2005) ‘UNDERSTANDINGS OF MARGIN OF ERROR’, in Proceedings of the Twenty-seventh Annual Meeting of the International Group for the Psychology of Mathematics Education. Raonake: Virginia Tech.
- University of Utah (2021) Central Tendency & Variability, University of Utah.