Learn to analyse with

STATA is a software package used for statistical data analysis, data management, and graphical representation. It is extensively used by researchers and professionals from the field of social sciences economics, epidemiology, biostatics, and finance.

Learning outcomes of this STATA module

  • The distinction between Regular series, Time Series, Univariate, Multivariate, and Panel Data.
  • Establishment of important properties of data like moving average, autoregressive average, first differencing and lags.
  • The distinction between Stationary and Non-Stationary Time Series.
  • Testing the time series on the basis of Stationarity, Heteroskedasticity, Autocorrelation, and Stability.
  • Proposing Multivariate analysis on more than one-time series.
  • The distinction between Pooled OLS regression and Panel Data Set regression.
  • The choice between Fixed Effect and Random Effect Models.
  • Panel level Heteroskedasticity and Autocorrelation.

Getting started with STATA

The articles in this section discuss different types of data that STATA  handles. Furthermore, it discusses the management of dependent and independent variables and how to import data from other formats.

Introduction to STATA

STATA, like SPSS is a smart data analysis tool used for data management and analysis. It is a fast and easy to use, across all... More

How to import data into STATA?

STATA comes with a set of sample data files. This helps the learner in understanding how different set of tests can be applied to single... More

How to manage variables in STATA?

Data entered in STATA can be classified either as numeric or string type. Associated with each type of data is its storage type i.e. the... More

How to create a do-file in STATA?

Do-file is an interface in Stata which allows the researcher to compile all the commands and results at one place. Once the commands are stored... More

Basic statistical tools

This section comprises of articles on basic statistical tools of correlation and regression. Two types of regression namely linear regression and non-linear regression are discussed. Furthermore, how they are analyzed in STATA and how the results can be interpreted.

How to do the correlation analysis in STATA?

Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.

Procedure and interpretation of linear regression analysis using STATA

Linear regression analysis is conducted to predict the dependent variable based on one or more independent variables.

Non linear regression analysis in STATA and its interpretation

In the previous article on Linear Regression using STATA, a simple linear regression model was used to test the hypothesis. However the linear regression will... More

Understanding the correlation and regression analysis values

This article explains the different correlation and regression analysis values that are generated after conducting the tests. Their meaning, importance and how to interpret them... More

Application of multivariate regression analysis

This article explores the concept of multivariate regression analysis along with discussing its assumptions and relevance.

Why is it important to test heteroskedasticity in a dataset?

Heteroskedasticity refers to the state of systematic changes in the spread of residuals or the error term of the model. The presence of heteroskedasticity leads... More

Why conduct a multicollinearity test in econometrics?

A multicollinearity test helps to diagnose the presence of multicollinearity in a model. Multicollinearity refers to a state wherein there exists inter-association or inter-relation between... More

How to test normality statistically?

Data scientists strictly prefer to test normality and work on normally distributed data because of its benefits.

How to work with a moderating variable in the regression test with SPSS?

Influencing or control variables are said to be a moderating variable and the effect of these interactions is represented as an interaction effect

How to work with a mediating variable in a regression analysis?

When the linkage between 2 variables exists through a middle variable. This middle variable is referred to as a mediating variable.

How to process the primary dataset for a regression analysis?

Regression analysis signifies the extent of the relationship between the dependent and independent variables.

What is the relevance of significant results in regression analysis?

Regression analysis is the statistical measurement which helps in linking the variables and determining the strength of the relationship between them.

Time series analysis

This section of the module discusses the analysis of continuous data sets that represents time dimensions. Furthermore, it also covers stationarity, normality, and stability in data. This section is designed in a way to introduce the Univariate Time Series and Multivariate Time series. It explores the structural economic model building and critical appraisal of models based on statistical testing methods.

How to set the ‘Time variable’ for time series analysis in STATA?

Time series analysis works on all structures of data. It comprises of methods to extract meaningful statistics and characteristics of data. Time series test is... More

Problem of non-stationarity in time series analysis in STATA

The purpose of this article is to explain the process of determining and creating stationarity in time series analysis. Creating a visual plot of data... More

Solution for non-stationarity in time series analysis in STATA

The previous article based on the Dickey Fuller test established that GDP time series data is non-stationary. This prevented time series analysis from proceeding further.... More

How to build the univariate ARIMA model for time series in STATA?

Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single... More

ARIMA modeling for time series analysis in STATA

In the previous article, all possibilities for performing Autoregressive Integrated Moving Average (ARIMA) modeling for the time series GDP were identified as under. S. No... More

How to predict and forecast using ARIMA in STATA?

After performing Autoregressive Integrated Moving Average (ARIMA) modelling in the previous article: ARIMA modeling for time series analysis in STATA, the time series GDP can be modelled... More

How to test normality in STATA?

Time series data requires some diagnostic tests in order to check the properties of the independent variables. This is called 'normality'. This article explains how... More

How to perform Heteroscedasticity test in STATA for time series data?

Heteroskedastic means “differing variance” which comes from the Greek word “hetero” ('different') and “skedasis” ('dispersion'). It refers to the variance of the error terms in... More

How to test time series autocorrelation in STATA?

This article shows a testing serial correlation of errors or time series autocorrelation in STATA. Autocorrelation problem arises when error terms in a regression model... More

How to perform point forecasting in STATA?

This article explains how to perform point forecasting in STATA, where one can generate forecast values even without performing ARIMA.

How to perform regression analysis using VAR in STATA?

In multivariate time series, the prominent method of regression analysis is Vector Auto-Regression (VAR). It is important to understand VAR for more clarity.

Lag selection and cointegration test in VAR with two variables

The previous article showed that the three-time series values Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC) and Private Final Consumption (PFC) are non-stationary.... More

How to perform Johansen cointegration test in VAR with three variables?

The previous article showed lag selection and stationarity for Vector Auto Regression (VAR) with three variables; Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC)... More

How to perform Johansen cointegration test?

To test cointegration, Johansen cointegration test is widely used which determines the number of independent linear combinations (k) for (m) time series variables set that... More

How to perform Granger causality test in STATA?

Applying Granger causality test in addition to cointegration test like Vector Autoregression (VAR) helps detect the direction of causality. It also helps to identify which... More

VECM in STATA for two cointegrating equations

Unrestricted Vector Auto Regression (VAR) is not applicable in such cases. Vector Error Correction Model (VECM) is a special case of VAR which takes into... More

How to test and diagnose VECM in STATA?

This article explains testing and diagnosing VECM in STATA to ascertain whether this model is correct or not. Among diagnostic tests, common ones are tested... More

How to identify ARCH effect for time series analysis in STATA?

Volatility only represents a high variability in a series over time.This article explains the issue of volatility in data using Autoregressive Conditional Heteroscedasticity (ARCH) model.... More

ARCH model for time series analysis in STATA

The previous article showed how to initiate the AutoRegressive Conditional Heteroskedasticity (ARCH) model on a financial stock return time series for period 1990 to 2016. It... More

Introduction to the Autoregressive Integrated Moving Average (ARIMA) model

Autoregressive Integrated Moving Average (ARIMA) is the statistical tool with a standard structure which though is simpler but provides skillful information about the stock market.

Panel data analysis

This section is designed in a way to introduce a panel data series and its estimation. Here the articles focus on pooled OLS regression, fixed effect models and random effect models. The issue of choice between fixed effect and random effect and panel level heteroskedasticity and autocorrelation are also covered.

What is panel data analysis in STATA?

This article of the module explains how to perform panel data analysis using STATA. In the case of panel data, the observations are present in... More

Performing pooled panel data regression in STATA

The underlying assumption in pooled regression is that space and time dimensions do not create any distinction within the observations and there are no set... More

How to perform Panel data regression for random effect model in STATA?

The previous article (Pooled panel data regression in STATA) showed how to conduct pooled regression analysis with dummies of 30 American companies. The results revealed... More

Problems faced during statistical analysis using panel data with STATA

In one of my recent projects I had to use panel data analysis. During the data analysis I faced some problems which may be the... More