Interpretation of factor analysis using SPSS

By Priya Chetty on February 5, 2015

We have already discussed factor analysis in the previous article, and how it should be conducted using SPSS. In this article, we will be discussing how the output of Factor analysis can be interpreted.

Interpreting factor analysis in SPSS

Descriptive statistics

The first output from the analysis is a table of descriptive statistics for all the variables under investigation. Typically, the mean, standard deviation, and the number of respondents (N) who participated in the survey are given. The mean value describes the characteristics of the most common response among the stated dataset. Therefore there is no minimum value required. Looking at the mean values in Table 1 below, one can conclude that the ‘respectability of product’ is the most important variable that influences customers to buy the product. The lowest value of 2.42 for ‘cost of the product’ indicates that the respondents approximately strongly disagree on the cost of product role. All the variables’ roles in consumers’ decisions to buy a product can be interpreted in a similar way.

The correlation matrix

The next output from the analysis is the correlation coefficient. A correlation matrix is simply a rectangular array of numbers that gives the correlation coefficients between a single variable and every other variable in the investigation. The correlation coefficient between a variable and itself is always 1, hence the principal diagonal of the correlation matrix contains 1s (See Red Line in Table 2 below). The correlation coefficients above and below the principal diagonal are the same. The determinant of the correlation matrix is shown at the foot of the table below.

With respect to the correlation matrix if any pair of variables has a value less than 0.5, consider dropping one of them from the analysis. For this factor, analysis needs to be reperformed with the exclusion of pair of variables with less than 0.5 value. The off-diagonal elements (The values on the left and right sides of the diagonal in the table below) should all be very small (close to zero) in a good model.

Kaiser Meyer Olkin (KMO) and Bartlett’s Test (measures the strength of relationship among the variables)

The KMO measures the sampling adequacy (which determines if the responses given with the sample are adequate or not) which should be close to 0.5 for satisfactory factor analysis to proceed. Kaiser (1974) recommends 0.5 (value for KMO) as a minimum (barely accepted), values between 0.7-0.8 are acceptable, and values above 0.9 are superb. Looking at the table below, the KMO measure is 0.417, which is close to 0.5 and therefore can be barely accepted (Table 3).

There is no significant answer to the question “How many cases respondents do I need to factor analysis?”, and methodologies differ. A common rule is to suggest that a researcher has at least 10-15 participants per variable. Fiedel (2005) says that in general over 300 Respondents for sampling analysis is probably adequate. There is universal agreement that factor analysis is inappropriate when the sample size is below 50.

Bartlett’s test is another indication of the strength of the relationship among variables. This tests the null hypothesis that the correlation matrix is an identity matrix. An identity matrix is a matrix in which all of the diagonal elements are 1 (See Table 1) and all off-diagonal elements (term explained above) are close to 0. You want to reject this null hypothesis. From the same table, we can see that Bartlett’s Test Of Sphericity is significant (0.12). That is, the significance is less than 0.05. In fact, it is actually 0.012, i.e. the significance level is small enough to reject the null hypothesis. This means that the correlation matrix is not an identity matrix.

Table 3: KMO and Barlett's test — Table 3: KMO and Barlett’s test

Communalities

The next item from the output is a table of commonalities which shows how much of the variance (i.e. the communality value which should be more than 0.5 to be considered for further analysis. Else these variables are to be removed from further steps of factor analysis) in the variables has been accounted for by the extracted factors. For instance over

90% of the variance in “Quality of product” is accounted for, while 73.5% of the variance in “Availability of product” is accounted for (Table 4).

Total variance explained

Eigenvalue actually reflects the number of extracted factors whose sum should be equal to the number of items that are subjected to factor analysis. The next item shows all the factors extractable from the analysis along with their eigenvalues.

The Eigenvalue table has been divided into three sub-sections:

Initial Eigen Values
Extracted Sums of Squared Loadings
Rotation of Sums of Squared Loadings.

For analysis and interpretation purposes we are concerned only with Initial Eigenvalues and Extracted Sums of Squared Loadings. The requirement for identifying the number of components or factors stated by selected variables is the presence of eigenvalues of more than 1. Table 5 herein shows that for 1st component the value is 3.709 > 1, 2nd component is 1.478 > 1, 3rd component is 1.361 > 1, and 4th component is 0.600 < 1. Thus, the stated set of 8 variables with 12 observations represents three components. Further, the extracted sum of squared holding % of variance depicts that the first factor accounts for 46.367% of the variance features from the stated observations, the second 18.471% and the third 17.013% (Table 5). Thus, 3 components are effective enough in representing all the characteristics or components highlighted by the stated 8 variables.

Component: As can be seen in the Communalities table 3 above, there 8 components shown in column 1 under table 3.
Initial Eigenvalues Total: Total variance.
Initial Eigenvalues % of the variance: The percent of variance attributable to each factor.
Initial Eigenvalues Cumulative %: Cumulative variance of the factor when added to the previous factors.
Extraction sums of Squared Loadings Total: Total variance after extraction.
Extraction Sums of Squared Loadings % of the variance: The percent of variance attributable to each factor after extraction. This value is of significance to us and therefore we determine in this step that they are three factors which contribute to why would someone buy a particular product.
Extraction Sums of Squared Cumulative %: Cumulative variance of the factor when added to the previous factors after extraction.
Rotation of Sums of Squared Loadings Totals: Total variance after rotation.
Rotation of Sums of Squared Loadings % of the variance: The percent of variance attributable to each factor after rotation.
Rotation of Sums of Squared Loadings Cumulative %: Cumulative variance of the factor when added to the previous factors.

Offer ID is invalid

Scree plot

The scree plot is a graph of the eigenvalues against all the factors. The graph is useful for determining how many factors to retain. The point of interest is where the curve starts to flatten. It can be seen that the curve begins to flatten between factors 3 and 4. Note also that factor 4 onwards has an eigenvalue of less than 1, so only three factors have been retained.

Figure 1: Screen plot — Figure 1: Scree plot

Component matrix

Table 6 below shows the loadings (extracted values of each item under 3 variables) of the eight variables on the three factors extracted. The higher the absolute value of the loading, the more the factor contributes to the variable. We have extracted three variables wherein the 8 items are divided into 3 variables according to the most important items which are similar responses in component 1 and simultaneously in components 2 and 3. The gap (empty spaces) on the table represents loadings that are less than 0.5, this makes reading the table easier. We suppressed all loadings less than 0.5. As the requirement of having precise computation of each factor component, Table 6 depicts that there is the presence of cross loading i.e. one factor measuring more than one component. As this cross-loading is very high in Table 6 i.e. cost of the product, the popularity of the product, the prestige of the product, and the quality of the product cross-loading, thus, for deriving more adequate results, these cross-loadings need to be eliminated. For this, the solution is to redistribute the factor loading by having rotation, and the hence rotated component matrix is examined for the identification of components.

Rotated component matrix

The idea of rotation is to reduce the number of factors on which the variables under investigation have high loadings. Rotation does not actually change anything but makes the interpretation of the analysis easier. Looking at the table below, we can see that availability of a product, and the cost of the product is substantially loaded on Factor (Component) 3. In contrast, experience with the product, the popularity of the product, and the quantity of the product are substantially loaded on Factor 2. Sometimes the loading of variables is there on two components or more. Therefore there is a requirement of checking the factor loading value.

If the value is lower than the required value of 0.5 or the set limit (which could be 0.6 too as per the researcher’s need of including the desired factor loading) for one of the components, then that variable could be considered for further analysis. But as the presence of more than 0.5 (or 0.6) loading in more than one component represents that this variable represents two components, thus, it is not effective in measuring a specific category. Hence, need to be excluded. As in Table 7 experience with the product, and quality of the product measures more than one component, thus, they can’t be considered for further analysis. Hence, further processing i.e. impact analysis or any other statistical analysis includes all variables except experience with the product, and quality of the product (Table 7).