Regression analysis is the statistical measurement which helps in linking the variables and determining the strength of the relationship between them. As stated in the previous article that regression analysis is used to determine the influence of independent variables on the dependent variables, thus, it is essential to determine the significant and effective conclusion about the relationship between the variables.
For example, consider a study to determine the impact of emotional awareness on the creativity level of students. In order to fulfil this purpose, it is required to determine the strength of the relationship between the factors affecting emotional awareness and creativity level of students.
The previous article has discussed the process of regression analysis and mentioned about the method to interpret the results derived from the analysis. However, there is a possibility of deriving an insignificant or biased result. Considering this inconclusiveness of the results, this article is based on stating the need for removing biases from the data and deriving accurate results.
What influences the significant regression analysis results?
Regression analysis results are mainly categorized into three:
- Model summary,
- ANOVA results and,
- coefficient table.
Herein each part of the analysis provides information about the significance of the model in deriving the relationship between the independent and dependent variable.
Factors affecting efficient results in the model summary
The first part of the regression results depicts the value of the coefficient of determination (R2) and Adjusted R2. Both values explain the proportion of variation that could be caused in the dependent variable due to the independent variables included in the model.
In case of simple linear regression R2 value, while in multiple linear regression Adjusted R2 value, briefly summarizes the efficiency of the model. Thus, for a general overview of the model, the requirement is to have R2 and Adjusted R2 greater than 0.5.
The formula for the computation of R2 and Adjusted R2 value states higher the correlation between the variables, more would be the value of R2 and adjusted R2. Coefficient value can also increase by including a large number of observations or reducing the number of independent variables.
Factors affecting ANOVA efficient results
The F-ratio computed in the ANOVA table represents the improvement in the prediction of the value of the dependent variable after considering the inaccuracy present in the model. The value of F-ratio should be greater than 1. F-ratio compares the dataset of different variables, thus the presence of high variability in the dataset of dependent and independent variables tends to reduce the value of F-ratio. Hence, in order to improve F-ratio and make it greater than 1, there should be less variability in the dataset. Furthermore, the presence of a large number of observations too increases the value of F-ratio and raises the prediction of the dependent variable from the independent variables.
Factors affecting the efficiency of significant coefficients
Coefficient values determine whether there is any relevant or significant impact of the independent variable on the dependent variable. The T-score value defines the significance level of a coefficient in the model. This should be less than the error or insignificance (1% or 5% or 10%) allowed in the model. The formula of T-score value shows that higher the variability in the dataset less would be the t-score value. Thus, the efficiency of the t-score is influenced by the presence of high variability. Furthermore, the presence of a smaller number of observations and a smaller number of coefficients (few independent variables) in the model too reduces the level of significance of the study. Hence, in order to derive the significant coefficient, it is required to raise the number of observation or sample size and reduce the number of independent variables and variability of the dataset.
Why is dataset processing for regression analysis is needed?
Regression analysis helps in stating the influence of independent variables on the dependent variables. Therefore it is necessary to ensure that the dataset is free from anomalies or outliers. However, many-a-times due to the presence of randomness and biases in human behaviour, there are chances of deriving inadequate or inefficient results. As social science studies are based on the analysis of the perception of people, there is a high possibility of variability in the dataset. Thus, in order to control this variability caused due to respondent biases, the processing of dataset is required before statistical analysis.