Learn to analyse with
STATA is a software package used for statistical data analysis, data management, and graphical representation. It is extensively used by researchers and professionals from the field of social sciences economics, epidemiology, biostatics, and finance.
Learning outcomes of this STATA module
- The distinction between Regular series, Time Series, Univariate, Multivariate, and Panel Data.
- Establishment of important properties of data like moving average, autoregressive average, first differencing and lags.
- The distinction between Stationary and Non-Stationary Time Series.
- Testing the time series on the basis of Stationarity, Heteroskedasticity, Autocorrelation, and Stability.
- Proposing Multivariate analysis on more than one-time series.
- The distinction between Pooled OLS regression and Panel Data Set regression.
- The choice between Fixed Effect and Random Effect Models.
- Panel level Heteroskedasticity and Autocorrelation.
Getting started with STATA
The articles in this section discuss different types of data that STATA handles. Furthermore, it discusses the management of dependent and independent variables and how to import data from other formats.
STATA, like SPSS is a smart data analysis tool used for data management and analysis. It is a fast and easy to use, across all... More
STATA comes with a set of sample data files. This helps the learner in understanding how different set of tests can be applied to single... More
Data entered in STATA can be classified either as numeric or string type. Associated with each type of data is its storage type i.e. the... More
Do-file is an interface in Stata which allows the researcher to compile all the commands and results at one place. Once the commands are stored... More
Basic statistical tools
This section comprises of articles on basic statistical tools of correlation and regression. Two types of regression namely linear regression and non-linear regression are discussed. Furthermore, how they are analyzed in STATA and how the results can be interpreted.
Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.
Linear regression analysis is conducted to predict the dependent variable based on one or more independent variables.
In the previous article on Linear Regression using STATA, a simple linear regression model was used to test the hypothesis. However the linear regression will... More
Heteroskedasticity refers to the state of systematic changes in the spread of residuals or the error term of the model. The presence of heteroskedasticity leads... More
A multicollinearity test helps to diagnose the presence of multicollinearity in a model. Multicollinearity refers to a state wherein there exists inter-association or inter-relation between... More
Data scientists strictly prefer to test normality and work on normally distributed data because of its benefits.
Influencing or control variables are said to be a moderating variable and the effect of these interactions is represented as an interaction effect
When the linkage between 2 variables exists through a middle variable. This middle variable is referred to as a mediating variable.
Regression analysis signifies the extent of the relationship between the dependent and independent variables.
Regression analysis is the statistical measurement which helps in linking the variables and determining the strength of the relationship between them.
Time series analysis
This section of the module discusses the analysis of continuous data sets that represents time dimensions. Furthermore, it also covers stationarity, normality, and stability in data. This section is designed in a way to introduce the Univariate Time Series and Multivariate Time series. It explores the structural economic model building and critical appraisal of models based on statistical testing methods.
Time series analysis works on all structures of data. It comprises of methods to extract meaningful statistics and characteristics of data. Time series test is... More
The purpose of this article is to explain the process of determining and creating stationarity in time series analysis. Creating a visual plot of data... More
The previous article based on the Dickey Fuller test established that GDP time series data is non-stationary. This prevented time series analysis from proceeding further.... More
Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single... More
In the previous article, all possibilities for performing Autoregressive Integrated Moving Average (ARIMA) modeling for the time series GDP were identified as under. S. No... More
After performing Autoregressive Integrated Moving Average (ARIMA) modelling in the previous article: ARIMA modeling for time series analysis in STATA, the time series GDP can be modelled... More
Time series data requires some diagnostic tests in order to check the properties of the independent variables. This is called 'normality'. This article explains how... More
Heteroskedastic means “differing variance” which comes from the Greek word “hetero” ('different') and “skedasis” ('dispersion'). It refers to the variance of the error terms in... More
This article shows a testing serial correlation of errors or time series autocorrelation in STATA. Autocorrelation problem arises when error terms in a regression model... More
This article explains how to perform point forecasting in STATA, where one can generate forecast values even without performing ARIMA.
In multivariate time series, the prominent method of regression analysis is Vector Auto-Regression (VAR). It is important to understand VAR for more clarity.
The previous article showed that the three-time series values Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC) and Private Final Consumption (PFC) are non-stationary.... More
The previous article showed lag selection and stationarity for Vector Auto Regression (VAR) with three variables; Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC)... More
To test cointegration, Johansen cointegration test is widely used which determines the number of independent linear combinations (k) for (m) time series variables set that... More
Applying Granger causality test in addition to cointegration test like Vector Autoregression (VAR) helps detect the direction of causality. It also helps to identify which... More
Unrestricted Vector Auto Regression (VAR) is not applicable in such cases. Vector Error Correction Model (VECM) is a special case of VAR which takes into... More
This article explains testing and diagnosing VECM in STATA to ascertain whether this model is correct or not. Among diagnostic tests, common ones are tested... More
Volatility only represents a high variability in a series over time.This article explains the issue of volatility in data using Autoregressive Conditional Heteroscedasticity (ARCH) model.... More
The previous article showed how to initiate the AutoRegressive Conditional Heteroskedasticity (ARCH) model on a financial stock return time series for period 1990 to 2016. It... More
Autoregressive Integrated Moving Average (ARIMA) is the statistical tool with a standard structure which though is simpler but provides skillful information about the stock market.
Panel data analysis
This section is designed in a way to introduce a panel data series and its estimation. Here the articles focus on pooled OLS regression, fixed effect models and random effect models. The issue of choice between fixed effect and random effect and panel level heteroskedasticity and autocorrelation are also covered.
This article of the module explains how to perform panel data analysis using STATA. In the case of panel data, the observations are present in... More
The underlying assumption in pooled regression is that space and time dimensions do not create any distinction within the observations and there are no set... More
The previous article (Pooled panel data regression in STATA) showed how to conduct pooled regression analysis with dummies of 30 American companies. The results revealed... More
In one of my recent projects I had to use panel data analysis. During the data analysis I faced some problems which may be the... More