# Understanding the correlation and regression analysis values

By Riya Jain & Priya Chetty on August 23, 2021

The previous article explained the basic terminologies of statistics. In a dataset, there can be more than one parameter; for instance, height, weight and age. This means that the dataset has 3 ‘parameters’, or ‘variables’. Sometimes these variables may be interlinked, i.e. the height can affect the weight, or age can affect the height and weight. In statistics, the correlation test helps to find out if such a relationship exists (Hayes and Anderson, 2021). This article explains the different correlation and regression analysis values that are generated after conducting the tests. Their meaning, importance and how to interpret them are explained here.

## Correlation test values

In a dataset, if a relationship exists between variables then it will be represented by the value called ‘correlation coefficient’. The correlation coefficient is the identification of similarities between the variables. The value of the correlation coefficient lies between -1 to 1 wherein -1 defines the perfectly negative relationship, 0 is the presence of no linkage between variables, while 1 states the presence of a perfect positive relationship. The explanation of different correlation analysis values is provided below.

## Regression analysis values

After finding out that two variables are correlated, a researcher typically moves to another step called regression. The regression test finds out the degree of impact of one variable on another (Beers, 2021). For instance, if a correlation test finds that age and weight are interlinked, then the regression will find out to what extent the age affects the weight or vice versa.

In the regression test, there are several values that represent the results. The key terms of regression analysis values are as follows.

## R square (R2)

The value of r-square helps in determining the proportion of variance or change in one variable explained by another variable. It represents the efficiency or capacity of the regression model to measure the impact. It should be more than 0.5 to be considered effective. However, it’s not always a compulsion as R square just states the proportion of variance which clearly does not define the impact. For example- The R square value of 0.7 between the income and happiness model state that about 70% of the variation in happiness is explained by income.

In cases where there are more than 2 variables, the R-square value is adjusted to the number of independent variables included in the model. This is called adjusted R2. For instance, if we want to find out how age and height affect weight, there are two independent variables (age and height) and one dependent variable ‘weight’. Here, the adjusted R square is used to determine the capacity of the model instead of the R square. The rule of efficiency is similar for adjusted R square i.e. the value should be more than 0.5 but it does not determine the impact. For example – The adjusted R square value of 0.8 means that age and height measure 80% variation in the weight of a person.

## F ratio

F-ratio is typically the measure of comparing means of a group from the overall variance of the model. However, in regression analysis, F-ratio helps in determining the precision of the model. If the value is more than 1, it represents that model is precise and could be used for determining the impact.

## Beta value

Beta value is also known as the ‘coefficient’ in the regression table. It is the measure of the variable contribution in the dependent variable. In simpler terms, it represents the magnitude of impact. For example- in the model of age’s impact on weight, the beta value of 0.21 states that a 1% increase in age increases the weight by 0.21%.

## Significance level

It is the maximum risk of error that the researcher defines at the beginning of the research. In other words, it is the threshold of tolerating the false positive (wrong results) in the model. Before running a regression test, the researcher decides the extent of error should he/she can tolerate in the result (0% or 5% or 10%). The lower the % of error, the stronger the result. However, most studies will have some extent of the error. Therefore, error level of 5% (significance level 95%) and error level 10% (significance level 90%) is most popular.

## p-value

This is the most important value in the regression test. Based on this value, the hypothesis is accepted or rejected. The p-value needs to be less than the statistical significance level. For example, if the researcher decides that the significance level while testing the impact of age on weight is 95%, then the p-value should be less than 0.05. If he decides that the significance value should be 90% then the p-value should be less than 0.10.

## T-statistic

Test statistic decides how close the values are to the null hypothesis. Majorly Z-score and T-statistic are the most commonly used test statistic for the regression analysis. If the absolute test statistic value is more than the tabulated test statistic value, then the assumed hypothesis is rejected. For example – for studying the impact of age on weight, the t-statistic value is 2.28 while the tabulated value is 1.96 at a 5% significance level, then the null hypothesis of no impact is rejected.

Knowledge of the above terms makes interpretation of statistical tests, particularly correlation and regression, much simpler.