Understanding the foundations of structural equation modeling

By Riya Jain and Priya Chetty on October 14, 2021

Structural equation modeling (SEM) is a powerful multivariate technique used increasingly in scientific investigations for evaluating and testing the relationships between variables. It is different from other modelling methods such as regression and factor analysis. SEM helps in determining the direct and indirect effect on the relationship between variables. Although it is nearly 100 years old, the methodology has evolved significantly. A number of new parameters or indicators have been added to the model. In order to understand what they represent, it is essential to first understand the meaning and terminologies of these values.

The previous article introduced the concept of SEM analysis, its meaning, relevance, and benefits. However, it is also important to understand different terminologies that form the basics of SEM analysis.

Basic terminologies of structural equation modeling

Structural equation modeling (SEM) can be applied in LISREL and SPSS AMOS software. Irrespective of the software in use, the figures and values that emerge remain the same. The main terminologies that are used while formulating a model and analyzing the relationship between variables are stated below.

1. Measured or Observed variable

This variable is the observations which are measured i.e., a variable’s numeric data which is available and can be extracted from other sources for statistical analysis. The measured variable in a model is represented by rectangle (USF, 2021).

2. Latent variable

A latent variable in the model is the observation value of that variable for whose data is not available in numeric form. With the inclusion of this non-measured variable, the SEM model is developed. Latent variables in model is represented by oval shape (Kenny, 2011; USF, 2021).

3. Measurement error

Measurement error is regarded as the error or biasness present while measuring the value of a variable (i.e. in collecting the data for any variable). Thus, the difference in correlation value and the actual value of 1.00 is the measurement error. There is always a possibility of error in measured or observed variable. This is called ‘measurement error’. On the other hand, latent variable does not have measurement error because it does not have any numeric value. The value just is derived by model while computation and not measured by researcher (Kenny, 2011).

4. Unstandardized estimate

These estimates are the values which maintain the scaling information of variables and can be interpreted with reference to scale. Thus, while developing the linkage between the variables and assessing the impact, unstandardized estimate value is used (Suhr, 2006).

5. Standardized estimate

Standardized estimate are the values whose mean is 0 and variance is 1 i.e. transformation of unstandardized estimate is done for removing the scaling information from the model. It is essential to remove scaling because in some cases, the dataset for independent variables is computed at different scales. Unevenness in scaling increases the chances of error and reduces probability of efficient results. Thus, standardized estimates could be used for measuring effect size but majorly it is used for informal comparisons of parameters (Kenny, 2011; Suhr, 2006). However, for the models with all independent variables computed at one scale, standization is not essential.

6. Endogenous variable

The variable whose values is dependent on one or more variable in the model is said to be the endogenous variable. In more common terms, endogenous variable is also stated as dependent or response variable (Kenny, 2011).

5. Exogenous variable

These variables are the ones whose value is already determined and are responsible for determining the values of other variables. They could also be stated as independent or predictor variable (Kenny, 2011).

6. Direct effect

It is the directional relationship between two variables i.e. the impact of one variable on another which is majored by the direction of arrow. It consists of independent and dependent variables (Suhr, 2006).

7. Indirect effect

The indirect effect is the effect of one independent variable on another variable by having inclusion of one or more intervening variable. It is the effect measures by building in the mediating relationship between variables (Suhr, 2006).

8. Factor loading

The factor loading is the estimate of indicator variance amount that is accounted for latent construct. As it helps in measuring the linkage of observed variables with latent construct, thus, it is often recommended to have the inclusion of variables with loading value more than 0.4, however, lower value could also be considered as it represent the association of variables. As the factor loading is based on the association of one variable with other, thus, it could be positive or negative depicting the nature of association (Meyer, 2020).

9. Covariance

The covariance between the two variables is correlation times product of variables standard deviations. It is represented by two sided arrow wherein the covariance of variable with itself depict the variable variance (Kenny, 2011).

10. Just-identified model

It is also known as saturated model. This is because the number of unknowns (the variables) is exactly equal to the number of equations i.e. model has zero degree of freedom, hence the model is just identified. Herein, value of free parameter could be obtained by just one manipulation to observed data showing the presence of unique solution i.e. x + y = 5 and 2x + y = 8 (Kenny, 2011;Suhr, 2006).  This is an ideal model.

11. Over-identified model

The model wherein parameters are identified and the number of unknown parameters (variables) is less than the number of equations, thus, making model overidentified. Herein, as there are more equations than unknown variables, thus, there is no exact solution i.e. x + y = 6, 2x + y = 9, and x + 2y = 10. This is not an ideal model. Though the model is overidentified but an advantage for this is that the model still can be tested for fitness (Kenny, 2011;Suhr, 2006).

12. Under-identified model

A model for which a single unique value could not be determined or estimated from the observed data for one or more free parameters included in model. Herein, some parameters remain unidentified. For example, in case of x + y = 6; the value of one variable is totally dependent on another variable thus, as unknowns are more than equation hence model is under-identified (Kenny, 2011;Suhr, 2006). This is not an ideal model. Therefore, it must be rejected and no futher action can be performed on it.

Visual representation of the foundations of SEM

As the relevance of structural equation modeling (SEM) is growing, and many-a-times due to technicality associated with each symbol means, the formulation of effective model becomes complicated. Thus, herein, major terminologies of a SEM model is represented diagrammatically for inferring more adequate information about them. However, below figure is just the picturization of the foundations and not the statements of path diagram as it could vary based on the research requirement and variables.

Foundations of structural equation modeling
Figure 1: Foundations of structural equation modeling

Above figure shows that the variables OV1 to OV6 are observed variable while LV1 to LV3 are the latent variable. Herein, e1 to e6 are the measurement error while LV1 is exogenous variable and LV3 is endogenous variable for the model and LV2 is the mediating variable thus, the impact from LV1 on LV3 is direct effect while impact of LV1 on LV3 via LV2 is indirect effect. All one-sided arrows depict the one-sided relationship while the two-sided arrow i.e. from e1 to e2 state the covariance.

The models can be of different types

Structural equation modeling over time has grown to be a relevant methodology in building in the linkage between variables and presenting in a more visual outlook of the relationship. With the examination of key terminologies like latent variable, observed variable, measurement error, covariance, or estimates; the basic understanding of the SEM working is derived.


Riya Jain