How to develop a questionnaire for correlation and regression test?

There different instruments to collect primary data and the most widely used is the questionnaire in a survey method. Correlation and regression tests are two of the basic statistical tools that are widely applied to analyze data. In research, these tests are particularly used when the researcher seeks to find relations between variables or impacts of variables on other variables in order to draw inferences. Correlation analysis helps to find the degree of linear association between any two variables. Regression analysis, on the other hand, helps to determine whether a set of variables (independent variables) have any impact on another variable (dependent variable). Here ‘impact’ means how much of the variations (change) in the values of the dependent variable can be explained by the variations (change) in the values of the independent variables. Furthermore, correlation analysis is done first. Once it is found that there is a significant relationship between the dependent and the independent variables, the researcher can proceed to conduct a regression analysis.

Need to develop a questionnaire

The need for developing a questionnaire arises when the study is based on primary data.  Questionnaire serves as one of the most common research instruments for primary data collection. A questionnaire can be quantitative or qualitative in nature. A qualitative questionnaire contains open-ended questions and responses are collected by interviews. Moreover, a quantitative questionnaire is the one that contains closed-ended questions and responses containing numerical values. These values are then coded with the use of suitable statistical software such as SPSS, STATA or R. Following this, data analysis can be done for correlation and regression tests. However, it is challenging to develop the questionnaire in a way that is suitable for these types of analyses.

Stages of developing a questionnaire for correlation and regression

  1. Identifying the dependent and independent variables– From the review of the literature, identify the dependent and independent variables that will establish the aim and objectives of the research. While reviewing literature, focus on the objectives of the previous studies and their findings.
  2. Framing the conceptual framework– After the identification, these variables can then be presented in the form of a flow chart known as the conceptual framework. Furthermore, independent variables are depicted to affect the dependent variable. Moreover, the conceptual framework is a part of the literature review.
  3. Framing the hypotheses– This step is very important for correlation and regression tests as it presents the conjectures in the form of testable statements. Typically, there are two types of hypotheses. The first is a null hypothesis which states ‘no effect’ or ‘no impact’ of the independent variables on the dependent variables. The second is an alternative hypothesis which contradicts the null hypothesis and states that ‘there is an effect’. From the depiction in the conceptual framework, these hypotheses can be framed. Furthermore, hypothesis framing is the part of the chapter on research methodology.
  4. Framing questions or statements in the questionnaire– A quantitative questionnaire generally contains two parts. The first part collects responses about the demographic profile and general background of the survey participants. The second part is for inferential analysis including correlation and regression tests. For this purpose, make a number of questions or statements that seek responses on a scale. The most commonly used scale in survey research is Likert scale.

An example of a questionnaire

Suppose the investigation is about the effect of organizational factors on work-life balance (WLB). Therefore WLB is the dependent variable in the study.

Stage 1

Let’s say the following independent variables are identified from the literature review.

  1. Compensation
  2. The safe environment at work
  3. Training
  4. Job engagement
  5. Workload
  6. Scope of promotion
  7. Social security
  8. Organizational support

Stage 2

Frame the conceptual framework.

Conceptual framework for questionnaire survey

Figure 1: Conceptual framework

Stage 3

Frame the hypotheses.

Null hypothesis– Organizational factors namely compensation, safe environment at work, training, job engagement, workload, the scope of promotion, social security, and organizational support do not have any effect on work-life balance.

Alternative hypothesis– Organizational factors namely compensation, safe environment at work, training, job engagement, workload, the scope of promotion, social security, and organizational support have a significant effect on work-life balance.

Stage 4

Frame questions and/or statements in the questionnaire as shown in the table. The survey participants can be asked to put a tick mark in the boxes against each statement of what they think to be an appropriate choice.


Likert scale

Strongly Disagree Disagree Neither Agree nor Disagree Agree Strongly Agree
There is a significant impact of organizational factors on WLB.
Compensation at the workplace has an effect on WLB.
A safe environment in the workplace has an effect on WLB.
Training received at the workplace has an effect on WLB.
Job engagement of employees affects their WLB.
Workload has an effect on WLB.
The scope of promotion at the workplace has an effect on WLB.
Social security offered by employers at the workplace has an effect on WLB.
Organizational support at the workplace has an effect on WLB.

Table 1: Questionnaire statements for inferential analysis

In the above table, the first statement (highlighted in bold) collects responses that will serve as the data for the dependent variable (WLB). Subsequently, all the other statements gather responses which will provide the data for the respective independent variables or factors.

Important points to note

  1. The statements in the questionnaire for correlation and regression tests need to follow the conceptual framework and the hypotheses directly.
  2. The independent variables that have a significant relationship with the dependent variable as found through correlation analysis should ideally be included in the regression analysis.
  3. The example given in this article contains a small number of statements. In actual researches such as in a PhD thesis, often there are a large number of factors (independent variables) that are identified from the review of the literature. In those cases, an exploratory factor analysis (EFA) is suggested to be performed before correlation analysis. EFA helps to club similar factors into one and reduce the number of statements that are subsequently included in the correlation analysis.

Saptarshi Basu Roy Choudhury

Senior Research Analyst at Project Guru
Saptarshi has done his M. Phil in International Trade and Development and Masters in Economics from Jawaharlal Nehru University, New Delhi. His academic interests include issues related to economics of climate change, regulation and contemporary trade theories. He has a keen interest in current affairs and likes to read and travel in his spare time.
Saptarshi Basu Roy Choudhury

Related articles

  • Correlation of variables in SPSS It measures the correlations between two or more numeric variables. There are two types of correlations; bivariate and partial correlations. While Bivariate Correlations are computed using Pearson/Spearman Correlation Coefficient wherein it gives the measure of correlations between […]
  • Correlation analysis using STATA Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.
  • How to test time series autocorrelation in STATA? This article shows a testing serial correlation of errors or time series autocorrelation in STATA. Autocorrelation problem arises when error terms in a regression model correlate over time or are dependent on each other.
  • Multivariate analysis with more than on one dependent variable The normal linear regression analysis and the ANOVA test are only able to take one dependent variable at a time. So one cannot measure the true effect if there are multiple dependent variables. In such cases multivariate analysis can be used.
  • Linear regression analysis using SPSS In order to determine the relationship between dependent variable and a set of multiple independent variables, linear regression analysis is conducted.



Trackbacks and Pingbacks:

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.