Introducing data in SPSS

By Priya Chetty on January 17, 2015

If your data isn’t already in a computer-readable SPSS format, you can enter the information directly into the SPSS Data Editor (See Article Importing Files and creating datasheet in SPSS).  From the menus, choose file, -> new, -> data, which opens the data editor in the data view.  If you type a number into the first cell, SPSS will label that column with the variable name VAR00001.  To create your own variable names, click the variable view tab and change them accordingly.

Assigning variable names and properties

In the name column, enter a unique name for each variable in the order in which you want to enter the variables.  The name must start with a letter, but the remaining part of the variable can be letters or digits.  A name can’t end with a period, contain blanks or special characters, or be longer than 64 characters.

Assigning descriptive labels

  1. Variable Labels: Assign descriptive text to a variable by clicking the cell and then entering the label.  For instance, for the variable “Penalty” the label says “favour or oppose the death penalty for murder.”
  2. Value labels: To label individual values, click the button in the Value column.  This opens its dialogue box.  For Penalty, the label is coded 1 = favor, 2 = oppose.  The sequence of operations is to: enter the value, enter its label, click Add, and repeat this process for each value.

Note: Labels for individual values are useful only for variables with a limited number of categories whose codes aren’t self-explanatory.  You don’t want to attach value labels to individual ages; however, you should label the missing value codes for all variables if you use more than one code.

Assigning missing values

Missing values deserve special attention, and assigning values when responses are missing within SPSS will be discussed in this section. There are two types of missing values, “user-defined missing values” and “system missing values”.

User-defined missing value: sometimes the respondent leaves a question unanswered or doesn’t tick any option. Or he/she may simply write “I don’t know” or “NA” in front of a question in the questionnaire. Then you will define the missing values for such cases.

The image below shows the missing values in different columns in the survey response (Figure 1).

Missing values
Missing values

Now in order to define the missing values, the eighth column in SPSS Variable view is called “Missing”. Click on the blue box that appears on the first cell under that column as shown in the image below (Figure 2).

'Missing' Variable
‘Missing’ Variable

Now a Dialog box will open, which specifies three options i.e. No Missing Values, Discrete Missing Values and Range plus one optional discrete missing value (Figure 3).

Missing value options
Missing value options

No missing values

SPSS takes this as a default option. So, if the user has not defined any missing value then “No Response” is also taken as a valid response. For example, if out of 10 responses, 9 respondents have given their response and 1 hasn’t, by choosing this option, it will still consider the total number of respondents as 10. This is statistically incorrect, because if a person hasn’t answered a question in the questionnaire then he/she should not be considered a respondent for that question at all. Therefore, it is not considered reliable when calculating results.

Therefore, if any field/cell is blank in your excel sheet then you must not click this option.

Discrete missing values

In the case of Discrete Missing Values, the researcher can define some values of missing responses. For example, we define, the discrete missing values as shown in the image given below. You can add up to three discrete missing values.

Suppose you distributed 100 questionnaires, out of which 3 people said “I refuse to answer”, 3 people said “Don’t know” and a further 3 people said “NA”. Then  you will follow these steps to define the missing values:

Name these responses (Refused to answer, Don’t know, and Not Applicable) in the “VALUES” column by giving each label a number. The number should be such that it should not be the same as other values/codes anywhere in the whole questionnaire. This means, supposing your questionnaire has 10 questions and each question has a maximum of 6 responses, then you will give the 3 labels the following codes:

  • Refused to answer= 7
  • Don’t know= 8
  • Not Applicable= 9

When the frequency analysis is conducted, these missing values are not included in valid responses and SPSS also defines the frequency and percentage for each category defined by the values indicated in the boxes (see table below).

Frequency Analysis of missing values
Frequency Analysis of missing values

Range plus one optional discrete missing value

Since in the case of discrete missing values, the user can only define three values for missing responses, therefore in order to include more values, the researcher can choose a range. As can be seen in the image below (Figure 5), the researcher has defined a between 7 to 10 Plus one Discrete value of 11 (which is optional).

Range plus one optional discrete missing value
Range plus one optional discrete missing value

However, keep in mind that if you give the range as 15-50, then you have to define each and every number in that range with a Label. Therefore, avoid taking this option while doing analysis in SPSS.

In addition to user-defined missing values, there are system defined missing values as well. System-missing values are assigned by SPSS to any blank numeric cell in the Data Editor or to any calculated value that is not defined.  A system-missing value is indicated with a period (.). When SPSS reads this variable, it will read it blank and treat the value as though it is missing.

Note: You can’t assign missing values to a string variable (a variable which is made of letters instead of numbers/ variables for which you have not given any code).  For example, if for “Gender” you have not defined “Male”= 1; “Female”= 2 in SPSS, then you cannot assign any missing values to that question in SPSS.

Assigning levels of measurement

Click in a cell in the Measure column to assign a level of measurement to each variable.  You have 3 choices: nominal, ordinal, and scale (See, Difference between Nominal, Ordinal and Scale).

Warning 1: If you don’t specify the scale, SPSS attempts to describe it based on the characteristics of the data, but its judgment in this matter is fallible.
Warning 2: Although SPSS assigns a type of measurement (Scale, Nominal or Ordinal) to each variable, we should not depend on it blindly.  SPSS will let you calculate means for nominal variables as long as they have numeric values.  Certain statistical procedures don’t allow string variables in particular fields in the dialogue boxes.  For example, you can’t calculate the mean of a string variable.

Saving the data file

You must always save your data periodically so that you don’t have to start from scratch if anything goes wrong.

NOTES

Discuss