Correlation analysis using STATA

Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.

  1. Pairwise correlation which treat each pair of variables separately and only includes observations which have valid values for each pair in the data set.
  2. The second type of correlation is the normal correlation which takes the entire data set as one and calculates the correlation for all valid values.

In other words, in pairwise correlation the linear relationship between the variables is computed. However, the only difference is in the way missing values are handled. In case of Pairwise correlation, pair of data of points are deleted from the computation incase one or both of data points are missing in the dataset. In case the varlist is not defined then the matrix is displayed for all the variables in the dataset.

Doing correlation analysis using dropdown list


Statistics > Summaries, tables, and tests > Summary and Descriptive Tests > Correlations and covariances

Pairwise Correlation

Statistics > Summaries, tables, and tests > Summary and Descriptive Tests > Pairwise Correlation

Defining relationship between variables using correlation analysis in STATA

Correlation analysis using STATA

In order to improve the viability of results, pairwise correlation is done in this article with example. From the drop-down button, select the variables that you need to correlate.

one can choose different options ( sig level, number of obs ) while conducting correlation analysis is STATA

Various options available for correlation analysis in STATA

Using the graphical user interface, the commands which have been discussed above can be carried out by selecting the variables. Next check the boxes titled:

  • Print number of observations for each entry.
  • Print significance level for each entry.
  • Significance level for displaying with a star.

See image below:

selection of different options for correlation analysis in STATA

Selecting different options in correlation analysis

Commands used for pairwise correlation

The basic code for pairwise Correlation is:

pwcorr VariableA VariableB

In case one wants STATA to produce p-value (statistically significance level), one needs to add sig, at the end of the command like shown below:

pwcorr VariableA VariableB, sig

In case the researcher wants to determine if the results are significant at a specific confidence interval (ex: p < .05 or .01), then the command is preceded by sig star (.05 or .01)

pwcorr VariableA VariableB, sig star (.05)

In case the researcher wants to observe the number of observations (N or sample size) i.e. obs, then the command is:

pwcorr VariableA VariableB, sig star(.05) obs

Output for pairwise correlation in STATA

The pairwise correlation was done between price, mileage (mpg), repair record 1978 (rep78) and headroom. The table below reflects the Pearson coefficient value for each variable, the significance value and the sample size in the data set (variable, as in case of rep78 it is 69 and for rest it is 74).

pwcorr price mpg rep78 headroom, obs sig star(5)
 price mpg rep78 headroom
price 1.0000
 mpg -0.4686* 1.0000
74 74
rep78 0.0066  0.4023* 1.0000
0.9574 0.0006
69 69 69
headroom 0.1145  -0.4138* -0.1480 1.0000
0.3313 0.0002 0.2249
74 74 69 74

The output reflects that there is negative correlation between mpg and price of the car which is significant at 5% significance level. Similarly negative correlation exists between headroom and mpg. Further, positive correlation  between rep78 and mpg.

Creating a do-fileProcedure and interpretation of linear regression analysis using STATA

Priya Chetty

Partner at Project Guru
Priya is a master in business administration with majors in marketing and finance. She is fluent with data modelling, time series analysis, various regression models, forecasting and interpretation of the data. She has assisted data scientists, corporates, scholars in the field of finance, banking, economics and marketing.

Related articles

  • Non linear regression analysis in STATA and its interpretation In the previous article on Linear Regression using STATA, a simple linear regression model was used to test the hypothesis. However the linear regression will not be effective if the relation between the dependent and independent variable is non linear.
  • Procedure and interpretation of linear regression analysis using STATA Linear regression analysis is conducted to predict the dependent variable based on one or more independent variables.
  • How to perform Granger causality test in STATA? Applying Granger causality test in addition to cointegration test like Vector Autoregression (VAR) helps detect the direction of causality. It also helps to identify which variable acts as a determining factor for another variable. This article shows how to apply Granger causality test in STATA.
  • Building univariate ARIMA model for time series analysis in STATA Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X1 X2….XN regressor (like […]
  • How to test time series multicollinearity in STATA? The problem of multicollinearity arises when one explanatory variable in a multiple regression model highly correlates with one or more than one of other explanatory variables. It is a problem because it underestimates the statistical significance of an explanatory variable (Allen, 1997).


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.