Problems faced during statistical analysis using panel data with STATA

In one of my recent projects I had to use panel data for analysis. During the data analysis I faced some problems which may be the most common problems in panel data analysis. So here are some of the problems with their possible solutions that helped me.

Importing panel data into STATA

Problem: The first step for any statistical analysis  is to import data from various sources to the statistical software. In my case I had to import the the data from excel sheets. Unfortunately, STATA does not read data from excel sheet saved as xls or xlsx.

Solution: Exported the excel sheet in CVS (MS-DOS) format and then imported it into STATA

Panel data management

Problem: One of the major problem faced during the panel data  analysis was data management. If the data is not arranged properly then it is very difficult to get the regression results. Even if the results are obtained, they will not be robust.

Solution:  While conducting the panel data analysis the data should be saved in a particular format. For example, if we have data for 5 countries for 5 years then data for one country (country A in this case) should be in the following format.

Country Id T ( time period) Variable 1 Variable 2 Variable 3
A 1 2001
A 1 2002
A 1 2003
A 1 2004
A 1 2005
























String variable

Problem 3: While conducting the analysis in STATA, one common problem which I faced is the problem of string variable. If the variables is string then it not possible to conduct any analysis.

Solution : The string variable can be changed to the float or long format using the STATA command “destring“ or “encode“. We can either replace the string variable or create a new variable.

Descriptive analysis of panel data

Problem 4: Since panel data consists of both the time series and cross sectional data, the usual descriptive analysis procedure do not give much logical result.

Solution: For the descriptive analysis in the panel data, I found “xtsum” command very useful. Both the “between” and “within” can be presented in one table using this command.

Various tests performed in the analysis

While performing regression analysis using panel data, it is important to check the basic assumptions. These assumptions can be tested using the following tests:

Normality test

One of the basic assumption of the panel data is Normality. In STATA normality can be tested using the following procedure:

  • Run the regression
  • Predict the residuals

Now the normality can be tested either through the histogram or using the Jarque- Bera Test.

Jarque Bera Test:

Null hypothesis: Normality

Alternative hypothesis: Non- Normality

In the results if the p value is not significant at 5 % then we cannot reject the null hypotheis which means that there is normality.

Testing the heterskedasticity

If the variance of the variable over the period of time are not constant then the heteroskedasticity exists which violates the basic assumption of regression model. In Panel data also it is important to test the existence of heteroskedasticity. One can test heterskedasticity in STATA either using the “rvfplot” (graphical) or the through Breusch – Pagan Test (numerically).

In the Breusch-Pagan Test the null hypothesis is that of homoscedasticity i.e

Null hypothesis : Homoskedasticity

Alternative hypothesis : Heteroskedasticity

In the results if p value is more than 0.05 ( 5% ) then we cannot reject the null hypothesis.

Testing the serial correlation

Higher order serial correlation in the panel data can be tested using the Breusch Godfrey test, which can be performed using the following steps:

  • Run the regression
  • Conduct BG test using the command “ estat bgodfrey, lag(1)”, where lag (1) indicates that we have taken one lag for the test.

In the results if the p value is not significant then we cannot reject the null hypothesis of “No serial correlation”.

However if the p value is significant then we reject the null hypothesis, means that there is serial correlation. To remove the serial correlation one can add the lag of the dependent variable as one of the independent variable.

Testing unit root

Unit root for the panel data can be tested using either the Leuin-lin-Chu test or the Hadri LM stationarity test.

Null hypothesis: Panels contains unit roots

Alternative hypothesis: Panels are stationary

In the results if the p value is less than 0.05 then we can reject the null hypothesis and accept the alternative hypothesis. Similarly the unit root for the first difference can also tested using similar method. The only thing which should be keep in mind is that before testing the first difference one must create a  new variable (which can be calculated by subtracting the variable in time period t with the time period t-1).

Now, if the results from the unit root test shows that the data is stationary then we can go ahead with further analysis. However if the results shows that our data is non-stationary then we can check stationarity in the first difference. If first difference is also not stationary and check for second difference and so on.

Choosing between random effect and fixed effect in panel data analysis

Another major problem faced while analyzing the panel data analysis is to choose between various forms of panel data analysis and use the appropriate one as per the requirement. This can be tested using the Hausman test and the test can be performed in STATA as follows:

Null hypothesis: Random effect model is appropriate.

Alternative hypothesis: Fixed effect model is appropriate

Now, to test

  • Run the regression (fixed effect).
  • Store the estimates.
  • Run the regression (random effect).
  • Store the estimates
  • Conduct the Hausman test (STATA command: hausman fixed random)

After running the hausman test if the p value is significant at 5% then we have to reject the null hypothesis and accept the alternative hypothesis i.e we should use the fixed effect in our model.

How to perform Panel data regression for random effect model in STATA?How to perform panel data analysis in E-Views?

Indra Giri

Senior Analyst at Project Guru
He completed his Masters in Development Economics from South Asian University, New Delhi. His areas of interest includes various socio development issues like poverty, inequality and unemployment in South Asia. Apart from writing for Project Guru he loves to travel and play football in his spare time.
Indra Giri

Latest posts by Indra Giri (see all)

Related articles

  • Performing pooled panel data regression in STATA The underlying assumption in pooled regression is that space and time dimensions do not create any distinction within the observations and there are no set of fixed effects in the data.
  • How to perform Panel data regression for random effect model in STATA? The previous article (Pooled panel data regression in STATA) showed how to conduct pooled regression analysis with dummies of 30 American companies. The results revealed that the joint hypothesis of dummies reject the null hypothesis that these companies do not have any alternative or […]
  • What is panel data analysis in STATA? This article of the module explains how to perform panel data analysis using STATA. In the case of panel data, the observations are present in time and space dimensions. For instance, a survey of the same cross-sectional unit such as firm, country or state over time.
  • How to perform panel data analysis in E-Views? E- Views offer an impressive toolkit that involves the series or the group of series that allows estimating panel data analysis ranging from the simplest to the complex types. Performing data analysis in E-views is easier to understand as all the necessary statistical modelling can be […]
  • Building univariate ARIMA model for time series analysis in STATA Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X1 X2….XN regressor (like […]


  1. Hi

    I want assistance in writing data analysis chapter for my thesis , I am pursuing for Ph.D in Library and Information Science.
    How can you help me.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.