Linear regression analysis using SPSS

In order to determine the relationship between dependent variable and a set of multiple independent variables, linear regression analysis is conducted.One can use the procedure to determine the influence of independent variables on dependent variable and to what extent. For example a person’s household income (i.e. dependent variable) is based on number of independent variables like; age, number of household members and years with employer.

Linear regression with SPSS

  • Step 1: From the Menu, Choose Analyze-> Regression -> Linear as shown in Figure 1 given below:
Figure 1: Linear regression
Figure 1: Linear regression
  • Step 2: This would open the linear regression dialog box (Figure 2). Select Household Income in thousands and move it to dependent list. Next Select independent variables like; Age, Number of people in household and years with current employer and then move them to independent list. Click OK to run the test.
Figure 3: Linear regression
Figure 3: Linear regression

Inferences: Table 1below shows the Model Summary for the present test. The Model Fit output consists of “Model Summary” table and ANOVA table (Table 2). The Model Summary includes multiple correlation coefficient R and its Square i.e. R² and also the adjusted version of this coefficient as summary measures of the model fit. As can be seen the Linear Regression Coefficient R= 0.799 which indicates that there is strong correlation between dependent and independent variable (a closer figure to 1.000 means a strong correlation). In terms of variability, the value amount of R²= 0.634 or 63.4% which explains the variability within the population (this means that 63% population in the sample agree that on the correlation between the given variables). Further use of adjusted R² leads to a revised estimate that 60.8% of variability in Household income in the sample which is explained by three independent variables (i.e. Years with Current Employer, Age, and Number of people in Household).

Further the Standard Error of Estimate value reflected in the Table 1 is 12.021 is the mean absolute deviation, and is small considering the average household income ranges from 5000 to 25000 Rs.

Table 1: Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.799a

.638

.608

12.021

a. Predictors: (Constant), Years with Current Employer, Age, Number of people in Household

Legends

  • R: Multiple correlation coefficient that tells us how strongly the multiple independent variables are related to the dependent variable
  • R Square: Indicates how much of the total variation in the dependent variable is due to the independent variables
  • Adjusted R Square: After removing the errors the software also presents the adjusted R square.
  • Std. Error of the Estimate: It represents the standard deviation of the error term

Setting the confidence interval at 95%, the results of ANOVA test (Table 2 below) provides an F-test value for the null hypothesis i.e. none of the independent variables are related to household income. However based on the analysis, we can reject the Null hypothesis where F= 321.34 and p=0.001 (P< 0.01) wherein confidence interval is by default set at 95% and thus conclude that years with current employer, age and number of people in household reflect significant relation with household income. The software has calculated all 3 variables together here. If the result is not “significant” in this step, then we will not proceed to the next step, i.e. T-Test.

Table 2: ANOVAb (Analysis of Variance)

Model Sum of Squares df Mean Square F Sig.
1 Regression 430.337 3 143.446 321.34 .001a
Residual .000 251 .000
Total 430.337 254
a. Predictors: (Constant), Years with Current Employer, Age, Number of people in Household
b. Dependent Variable: Household Income in Thousands

Legends

  • Sum of Squares: It is associated with the three sources of variance, Total, Residual and Regression. This measure is not presented when presenting results.
  • df: Is associated with sources of variances where the value is N-1 i.e. number of respondents in the sample size minus 1.
  • Mean Square: When sum of Squares if divided by their respective df we get Mean Square.
  • F: It is the value obtained by dividing Mean Square Regression with Mean Square Residual which obtains the value of 321.34. This value should generally be above 3.95  and compliments the sig. value.
  • Sig.: Significance value reflects the significance of the regression model where in value between 0.10-0.50 means that it is significant at 95% confidence interval and value between 0.001-0.10 reflects that it is significant at 99% confidence interval.

The earlier table revealed that all 3 variables when combined have a significant relation with “household income”. In the next step we will determine the relation between each independent variable with the “household income” individually.

The output shown in Table 3, Coefficients provides the estimation of regression coefficients, standard error of estimates, t-tests, and Significance. The estimated regression coefficients are depicted under “Unstandardized Coefficients B” which predict the change in dependent variable (i.e. household income) when the independent variable (Age/ No. of people/ Years of experience) is increased by one unit conditional on all the other variables in the model remaining constant.

In order to test the null hypothesis you should refer to t-statistic value where the “significance” value(0.456) reflects that Age of the individual has no effect on the household income of the individual (a confidence interval of 95% means sig value has to be less than 0.05 to be considered “significant relationship”).

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1 (Constant)

11.306

7.315

1.546

.131

Age

.464

.130

.439

3.564

.456

Number of people in Household

.156

.205

.082

.754

.001

Years with Current Employer

21.071

4.561

.487

.4.315

.000

a. Dependent Variable: Household Income in Thousands

Legends

  • Unstandardized Coefficients (B): Reflects the values for the regression equation which predict the relationship between dependent variable from the independent variable. In simpler terms it reflects the change in dependent value with the change in predictor value i.e. the independent variable.
  • Std. Error: These are the standard errors associated with coefficients.
  • Standardized Coefficients (Beta): These coefficient values indicate which will be obtained if independent variables are standardized prior to analysis. By standardization we mean that all predictors (independent variables) values are measured using same unit of measurement.
  • t: This value along with sig. value is important to predict if we reject or accept the null hypothesis. Since the two values complement each other a lower sig. value would indicate higher t-value.
  • Sig.: As indicated above in case of 95% confidence interval the value would lie between 0.10-0.50 and in case of 99% confidence interval it would between 0.01-0.10.

In the above demonstrations, we concluded two things:

  1. The variability of all independent variables and dependent variable is 63%.
  2. The relationship between Age (independent variable) and Household income (dependent variable) is “not significant” (0.456)

The above method of regression is called “Enter” regression. Apart from this, there are 3 other major methods of regression, but they are seldom used. They are Forward, Backward and Step-wise regression.

Forward Selection

This method starts with a model containing none of the independent variables. In the first step, the procedure considers variables one by one for inclusion and selects the variable that results in the largest increase in R² (variability). Furthermore in the second step, the procedures consider variables for inclusion in a model that only contains the variable selected in the first step. In each step, the variable with the largest increase in R² is selected until, according to an F-test, further additions are judged to not improve the model.

Backward Selection

This method starts with a model containing all the variables and eliminates variables one by one, at each step choosing the variable for exclusion as that leading to the smallest decrease in R². Again, the procedure is repeated until, according to an F-test, further exclusions would represent a deterioration of the model.

Step Wise Selection

In a particular study where there are large number of independent variables, wherein you want to develop a regression model which would include only variables that are statistically related with the dependent variable then you can choose “Stepwise” method from the drop down list. If you will choose “stepwise” only variables which meet criteria in the Linear Regression Options dialog box will enter the equation.

Figure 3: Linear regression
Figure 3: Linear regression
Correlation of variables in SPSSSelecting cases for analysis in SPSS

Divya Narang

Research analyst at Project Guru
Divya is a Masters in economics with a specialization in econometrics. She has worked on various research works in the field of economics and finance. Apart from this, she has developed keen interest in business management, business policy, international marketing and strategic management. Her methodological work focuses on analysis on panel data using statistical software like SPSS and STATA. She loves to spend her spare time reading novels and playing badminton.
Divya Narang
6 thoughts on “Linear regression analysis using SPSS”
  1. Avatar Hassan Yakubu 2 years & 3 months ago

    If I want to find factors that affect students academic performance what statistical stool should i use

  2. Avatar Shruti Datt 2 years & 3 months ago

    Dear Hassan,
    You need to first identify the factors from literature which have been found to impact academic performance. after which you can do correlation and regression analysis.

  3. Avatar ZM 1 year ago

    This is not multivariate linear regression, the title should be multiple linear regression. As far as I know, multivariate analysis involves more than one dependent variable with one or more independent variable.

    • Avatar Prateek Sharma 1 year & 7 months ago

      Thanks for pointing it out, we do appreciate that. We have made the appropriate changes.

  4. Avatar Ngoni M 1 year & 7 months ago

    I agree that this is not multivariate regression, it is multiple regression

    • Avatar Prateek Sharma 1 year & 7 months ago

      Thank you for your informative comment. We have made the necessary corrections.

Discuss