Learn to analyse with
STATA is a software package used for statistical data analysis, data management, and graphical representation. It is extensively used by researchers and professionals from the field of social sciences economics, epidemiology, biostatics, and finance.
Learning outcomes of this STATA module
- The distinction between Regular series, Time Series, Univariate, Multivariate, and Panel Data.
- Establishment of important properties of data like moving average, autoregressive average, first differencing and lags.
- The distinction between Stationary and Non-Stationary Time Series.
- Testing the time series on the basis of Stationarity, Heteroskedasticity, Autocorrelation, and Stability.
- Proposing Multivariate analysis on more than one-time series.
- The distinction between Pooled OLS regression and Panel Data Set regression.
- The choice between Fixed Effect and Random Effect Models.
- Panel level Heteroskedasticity and Autocorrelation.
Getting started with STATA
The articles in this section discuss different types of data that STATA handles. Furthermore, it discusses the management of dependent and independent variables and how to import data from other formats.
Basic statistical tools
This section comprises of articles on basic statistical tools of correlation and regression. Two types of regression namely linear regression and non-linear regression are discussed. Furthermore, how they are analyzed in STATA and how the results can be interpreted.
Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.
Linear regression analysis is conducted to predict the dependent variable based on one or more independent variables.
In the previous article on Linear Regression using STATA, a simple linear regression model was used to test the hypothesis. However the linear regression will... More
This article explains the different correlation and regression analysis values that are generated after conducting the tests. Their meaning, importance and how to interpret them... More
This article explores the concept of multivariate regression analysis along with discussing its assumptions and relevance.
Heteroskedasticity refers to the state of systematic changes in the spread of residuals or the error term of the model. The presence of heteroskedasticity leads... More
A multicollinearity test helps to diagnose the presence of multicollinearity in a model. Multicollinearity refers to a state wherein there exists inter-association or inter-relation between... More
Data scientists strictly prefer to test normality and work on normally distributed data because of its benefits.
Influencing or control variables are said to be a moderating variable and the effect of these interactions is represented as an interaction effect
When the linkage between 2 variables exists through a middle variable. This middle variable is referred to as a mediating variable.
Regression analysis signifies the extent of the relationship between the dependent and independent variables.
Regression analysis is the statistical measurement which helps in linking the variables and determining the strength of the relationship between them.
Time series analysis
This section of the module discusses the analysis of continuous data sets that represents time dimensions. Furthermore, it also covers stationarity, normality, and stability in data. This section is designed in a way to introduce the Univariate Time Series and Multivariate Time series. It explores the structural economic model building and critical appraisal of models based on statistical testing methods.
Time series analysis works on all structures of data. It comprises of methods to extract meaningful statistics and characteristics of data. Time series test is... More
The purpose of this article is to explain the process of determining and creating stationarity in time series analysis. Creating a visual plot of data... More
The previous article based on the Dickey Fuller test established that GDP time series data is non-stationary. This prevented time series analysis from proceeding further.... More
Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single... More
Heteroskedastic means “differing variance” which comes from the Greek word “hetero” ('different') and “skedasis” ('dispersion'). It refers to the variance of the error terms in... More
This article explains how to perform point forecasting in STATA, where one can generate forecast values even without performing ARIMA.
In multivariate time series, the prominent method of regression analysis is Vector Auto-Regression (VAR). It is important to understand VAR for more clarity.
The previous article showed that the three-time series values Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC) and Private Final Consumption (PFC) are non-stationary.... More
The previous article showed lag selection and stationarity for Vector Auto Regression (VAR) with three variables; Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC)... More
Volatility only represents a high variability in a series over time.This article explains the issue of volatility in data using Autoregressive Conditional Heteroscedasticity (ARCH) model.... More
Autoregressive Integrated Moving Average (ARIMA) is the statistical tool with a standard structure which though is simpler but provides skillful information about the stock market.
Panel data analysis
This section is designed in a way to introduce a panel data series and its estimation. Here the articles focus on pooled OLS regression, fixed effect models and random effect models. The issue of choice between fixed effect and random effect and panel level heteroskedasticity and autocorrelation are also covered.
The previous article (Pooled panel data regression in STATA) showed how to conduct pooled regression analysis with dummies of 30 American companies. The results revealed... More
In one of my recent projects I had to use panel data analysis. During the data analysis I faced some problems which may be the... More