Variable management in STATA

Data entered in STATA can be classified either as numeric or string type. Associated with each type of data is its storage type i.e. the numbers are stored as byte, int, long, float, or double. STATA takes “float” as the default storage type for its variables. Similarly byte, int and long are usually used to hold integers. The table given below defines the storage type with minimum and maximum value for each variable along with byte size.

Storage Type Minimum Maximum Closest to 0 without being 0 Bytes
byte -127 100 +/-1 1
int -32767 32740 +/-1 2
long -2,147,483,647 2,147,483,620 +/-1 4
float -1.70141173319x 1038 1.70141173319 x1038 +/-10-38 4
double -8.9884656743×10307 8.9884656743 x 10307 +/-10-323 8

In case of strings, they are stored at str#, i.e. str1, str2, str3………….. str2045 or strL. Here str5 indicates its length.

For example male would be str4 (since the word ‘male’ has 4 character) and female would be str6 (since the word ‘female’ has 6 character).

Note that since STATA stores information in the memory, the storage space should be used judiciously.

For example: the string female with a length of 6 would waste memory if stored as str20. Similarly, “byte” size of numeric value would waste storage if it is saved as “double”.

Converting string to numeric

Once the data is entered into STATA, it automatically defines the storage type of the data. However sometimes the variables which has numerical value is taken as string variable. Since string cannot be used for analysis, it should be converted to numeric.

Check the data set in “Variable Window” (See below), wherein the Name, Label, Type and Format of the variable are defined.

For example: in this case since the var “make” is registered as str18 and in order to compute it as numeric, it has to be converted into a numeric.

Example of STATA variable in list

Variable list in STATA

To convert, use the command “destring ”. This command is used to convert string variables into numeric variables and vice a versa. Strings can be converted into numeric in two ways.

  1. Replace the string variable.
  2. Create a new variable in numeric.

In order to replace “make”, we will use the command:

destring make, replace

To generate a new var, use the command:

destring make, gen(make1)

where make1 is the new variable in the numeric form.

Other commands which can be used are:

  1. compress to compress the memory.
  2. destring to convert string to numeric and vice a versa.
  3. format to set output format.
  4. recast to change storage type.

Working with Variable Manager

On the main Stata window, click on “variable manager” to manage variables.

Variable manager icon can be used to manage the variables included

Variable manager tab in STATA main window

A new window will open,

Variable manager in STATA can be used to manage the variables ( label, format type)

Variable list in STATA

For each variable, the properties are defined on the right hand side:

  • name,
  • label,
  • type,
  • format,
  • value label and
  • notes (if any) can be added.

In case of categorical variables define values by clicking on “Manage”.

Variables name and format can be checked and edited in this window

Variable Manager window in STATA

Click on “Add Value” to add codes to each sub-category of the variable.

For example: to add information about gender, click “Add Value”. A new tab will open to define the value 1 for Male and 2 for female.

However new variables cannot be added in the this window. It can only be added in “Data Editor” (Edit) window. See image below for data editor window:

one can add data using the data editor ( edit) option

Data editor window in STATA

As shown in the figure above, one can mange (modified individually) by clicking on “Variable Properties Icon”.

Indra Giri

Indra Giri

Senior Analyst at Project Guru
He completed his Masters in Development Economics from South Asian University, New Delhi. His areas of interest includes various socio development issues like poverty, inequality and unemployment in South Asia. Apart from writing for Project Guru he loves to travel and play football in his spare time.
Indra Giri

Latest posts by Indra Giri (see all)

Related articles

  • Pooled panel data regression in STATA The underlying assumption in pooled regression is that space and time dimensions do not create any distinction within the observations and there are no set of fixed effects in the data.
  • Correlation analysis using STATA Correlation analysis is conducted to examine the relationship between dependent and independent variables. There are two types of correlation analysis in STATA.
  • Introduction to STATA STATA, like SPSS is a smart data analysis tool used for data management and analysis. It is a fast and easy to use, across all operating systems such as Windows, Unix and Mac.
  • Building univariate ARIMA model for time series analysis in STATA Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X1 X2….XN regressor (like […]
  • Solution for non-stationarity in time series analysis in STATA The previous article based on the Dickey Fuller test established that GDP time series data is non-stationary. This prevented time series analysis from proceeding further. Therefore, in this article possible solution to non-stationarity is explained.

Discuss

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.