Introduction to quantitative data analysis methods

By Muskan and Priya Chetty on November 17, 2021

Quantitative data is the information that can be quantified. It can be counted or measured, and given a numerical value. Quantitative data tends to be structured in nature and is suitable for statistical analysis. Quantitative data is used to address “How many?”, “How often?” or “How much?”. It could be discrete i.e. having numerical values or continuous values that can be broken up into parts.

Quantitative data analysis is helpful as it provides quantifiable and easy to understand results. Quantitative data can be analyzed in a variety of different ways. SPSS provides a large range of methods to analyze quantitative data. This article discusses the most commonly used methods in small program evaluation with examples.

Descriptive statistics helps to understand quantitative data

Descriptive statistics gives basic information about a dataset. Understanding these basics is important because the further steps of processing the data depend on it. In this type of data analysis, the information is presented using graphs and tables. The table below shows different methods of performing descriptive statistics.

Frequency distributionIt is a simple and useful description of one variable. It gives both the frequency and percentages of elements.  To determine the number of males and females in a group. Eg., out of 130 employees, 80 (61%) are male and 50 (39%) are female.
CrosstabulationsClassifies elements distributed according to two variables.From the data sample, how many males have a Master’s degree and how many females have a PhD?  
The measure of central tendencyThe central tendency can be defined as measures of the location of the middle in a distribution. The most common types of central tendency are:
Mean: The average value.
Median:  The value in the absolute middle.
Mode: The most frequently occurring value.  
A dataset has 10 values: 2, 4, 6, 8, 10, 10, 12, 14, 16, 18, 20.  
Mean: 60
Median: 10
Mode: 10.
Measures of variationMeasures of variation help to describe the variables further.
Min: The lowest value.
Max: The highest value.
Range: The difference between the lowest and highest value.
Standard deviation: The dispersion of values from the mean.
A dataset has 10 values: 2, 4, 6, 8, 10, 10, 12, 14, 16, 18, 20.
Min: 2 Max- 20
Range: 18
Standard deviation: 5.48

Compare means to determine significant difference in quantitative data

Comparing means can compare the mean of interval/ratio (scale) data with a hypothesized value or between different groups and determine if there is any significant difference. 

T-test: independent samplesThe independent samples t-test is a method for comparing the mean of one variable between two unrelated groups.1. To examine if there is a difference in the salary of male and female teachers.2. To examine if the score of an exam differs in children in highly educated versus uneducated parents.    
T-test: paired samplesA dependent or “paired” samples t-test is used to see the difference or change between two measurement points.  For example: To examine if the job satisfaction of an employee has improved after their boss took a course in “socio-emotional skills” (i.e. before versus after).
One-way ANOVAThe one-way ANOVA is very similar to the independent samples t-test. The difference is that the one-way ANOVA allows you to have more than two categories in your independent variable.Comparing how many cups of coffee people drink per day depending on if they have a low-stress, medium-stress, or high-stress job.

Compare predicted and observed quantitative data with Chi-square

A chi-square statistic is a test that measures the comparison between the model’s predicted data to the actual observed data. These tests are often used in hypothesis testing. The chi-square statistic is what compares the size of the difference between the expected and the observed data, given the sample size and the number of variables in the relationship.

For example; A marketing company wants to understand how purchasing patterns of luxury cosmetics varied with gender. For this, a chi-square test can be applied to compare the consumers’ purchasing perception.

Understand relationship between variables with correlation analysis

A correlation analysis tests the relationship between two continuous variables in terms of

  1. how strong the relationship is, and
  2. in what direction the relationship goes.

The strength of the relationship is given as a coefficient (Pearson’s r) which can be anything between -1 and 1.

For example; a link between eating habits and childhood obesity. The correlation method is conducted using a mathematical equation to estimate two variables and their relationship.

Factor analysis reduces the number of dimensions in large quantitative datasets

Exploratory factor analysis (hereafter referred to as factor analysis) helps to investigate the underlying structure in the pattern of correlations between several variables (often referred to as items). In a dataset with a large number of variables, factor analysis can help to investigate if the variables represent a smaller number of factors or dimensions.

For example, 250 students were asked about 50 different responses to television advertisements. Factor analysis can be applied to classify and reduce the 50 responses to 5 or 6 possible dimensions.

Regression analysis helps understand the relationship between variables

Regression analysis majorly is the type of quantitative data analysis focused on building a relationship between variables by examining the impact of one or more variables on others. Regression is of different types as shown in the table below.

Linear regressionLinear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. To see if the number of furry pets (y) is related to having children (x), residential area (x), and income (x).
Logistic regressionHelp predict the likelihood of an event happening or a choice being made.A logistic regression could be used to predict whether a political candidate will win or lose an election.
Multinomial regressionUsed when the dependent variable is nominal with more than two categories.The use of Chest X-ray images as features that indicate one of the three possible outcomes (No disease, Viral Pneumonia, COVID-19).

The nature of quantitative data is diverse. It can be said that no two datasets are the same. However, using basic methods of analysis simplifies the understanding of a complex dataset.

Nominal, ordinal and scale in SPSS
Latest posts by Muskan (see all)