Understanding the foundations of structural equation modeling

By Riya Jain & Priya Chetty on October 14, 2021

Structural equation modeling (SEM) is a powerful multivariate technique used increasingly in scientific investigations for evaluating and testing the relationships between variables. It is different from other modeling methods such as regression and factor analysis. SEM helps in determining the direct and indirect effects on the relationship between variables. Although it is nearly 100 years old, the methodology has evolved significantly. A number of new parameters or indicators have been added to the model. In order to understand what they represent, it is essential to first understand the meaning and terminologies of these values.

The previous article introduced the concept of SEM analysis, its meaning, relevance, and benefits. However, it is also important to understand different terminologies that form the basics of SEM analysis.

Basic terminologies of a structural equation modeling

Structural equation modeling (SEM) can be applied in LISREL and SPSS AMOS software. Irrespective of the software in use, the figures and values that emerge remain the same. The main terminologies that are used while formulating a model and analyzing the relationship between variables are stated below.

1. Measured or Observed variable

This variable is the observations that are measured i.e., a variable’s numeric data which is available and can be extracted from other sources for statistical analysis. The measured variable in a model is represented by a rectangle (USF, 2021).

2. Latent variable

A latent variable in the model is the observation value of that variable for whose data is not available in numeric form. With the inclusion of this non-measured variable, the SEM model is developed. Latent variables in the model are represented by oval shapes (Kenny, 2011; USF, 2021).

3. Measurement error

Measurement error is regarded as the error or biases present while measuring the value of a variable (i.e. in collecting the data for any variable). Thus, the difference in correlation value and the actual value of 1.00 is the measurement error. There is always a possibility of error in measured or observed variables. This is called ‘measurement error’. On the other hand, the latent variable does not have a measurement error because it does not have any numeric value. The value just is derived by the model while computation and not measured by the researcher (Kenny, 2011).

4. Unstandardized estimate

These estimates are the values that maintain the scaling information of variables and can be interpreted with reference to scale. Thus, while developing the linkage between the variables and assessing the impact, and unstandardized estimate value is used (Suhr, 2006).

5. Standardized estimate

Standardized estimates are the values whose mean is 0 and variance is 1 i.e. transformation of unstandardized estimates is done for removing the scaling information from the model. It is essential to remove scaling because in some cases, the dataset for independent variables is computed at different scales. Unevenness in scaling increases the chances of error and reduces the probability of efficient results. Thus, standardized estimates could be used for measuring effect size but majorly it is used for informal comparisons of parameters (Kenny, 2011; Suhr, 2006). However, for the models with all independent variables computed at one scale, standardization is not essential.

6. Endogenous variable

The variable whose values is dependent on one or more variable in the model is said to be the endogenous variable. In more common terms, an endogenous variable is also stated as a dependent or response variable (Kenny, 2011).

5. Exogenous variable

These variables are the ones whose value is already determined and are responsible for determining the values of other variables. They could also be stated as independent or predictor variables (Kenny, 2011).

6. Direct effect

It is the directional relationship between two variables i.e. the impact of one variable on another which is majored by the direction of the arrow. It consists of independent and dependent variables (Suhr, 2006).

7. Indirect effect

The indirect effect is the effect of one independent variable on another variable by having the inclusion of one or more intervening variables. It is an effective measure for building the mediating relationship between variables (Suhr, 2006).

8. Factor loading

The factor loading is the estimate of indicator variance amount that is accounted for latent construct. As it helps in measuring the linkage of observed variables with latent construct, thus, it is often recommended to have the inclusion of variables with a loading value of more than 0.4, however, a lower value could also be considered as it represents the association of variables. As the factor loading is based on the association of one variable with another, thus, it could be positive or negative depicting the nature of the association (Meyer, 2020).

9. Covariance

The covariance between the two variables is the correlation times product of the variable’s standard deviations. It is represented by a two-sided arrow wherein the covariance of a variable with itself depicts the variable variance (Kenny, 2011).

10. Just-identified model

It is also known as the saturated model. This is because the number of unknowns (the variables) is exactly equal to the number of equations i.e. model has zero degrees of freedom, hence the model is just identified. Herein, the value of the free parameter could be obtained by just one manipulation to observed data showing the presence of a unique solution i.e. x + y = 5 and 2x + y = 8 (Kenny, 2011; Suhr, 2006).  This is an ideal model.

11. Over-identified model

The model wherein parameters are identified and the number of unknown parameters (variables) is less than the number of equations, thus, making the model overidentified. Herein, as there are more equations than unknown variables, thus, there is no exact solution i.e. x + y = 6, 2x + y = 9, and x + 2y = 10. This is not an ideal model. Though the model is overidentified an advantage for this is that the model still can be tested for fitness (Kenny, 2011; Suhr, 2006).

12. Under-identified model

A model for which a single unique value could not be determined or estimated from the observed data for one or more free parameters included in the model. Herein, some parameters remain unidentified. For example, in the case of x + y = 6; the value of one variable is totally dependent on another variable thus, as unknowns are more than the equation hence model is under-identified (Kenny, 2011; Suhr, 2006). This is not an ideal model. Therefore, it must be rejected and no further action can be performed on it.

Visual representation of the foundations of SEM

As the relevance of structural equation modeling (SEM) is growing, and many times due to technicality associated with each symbol means, the formulation of an effective model becomes complicated. Thus, herein, major terminologies of an SEM model are represented diagrammatically for inferring more adequate information about them. However, the below figure is just the picturization of the foundations and not the statements of the path diagram as it could vary based on the research requirement and variables.

Foundations of structural equation modeling
Figure 1: Foundations of a structural equation modeling

The above figure shows that the variables OV1 to OV6 are observed variables while LV1 to LV3 are the latent variable. Herein, e1 to e6 are the measurement error while LV1 is the exogenous variable and LV3 is the endogenous variable for the model and LV2 is the mediating variable thus, the impact from LV1 on LV3 is a direct effect while the impact of LV1 on LV3 via LV2 is an indirect effect. All one-sided arrows depict the one-sided relationship while the two-sided arrow i.e. from e1 to e2 state the covariance.

The models can be of different types

Structural equation modeling over time has grown to be a relevant methodology in building in the linkage between variables and presenting a more visual outlook of the relationship. With the examination of key terminologies like a latent variable, observed variable, measurement error, covariance, or estimates; the basic understanding of the SEM working is derived.



Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

  • Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
  • Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
  • Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).


I am a master's in Economics from Amity University. Having a keen interest in Econometrics and data analysis, I was a part of the Innovation Project of Daulat Ram College, Delhi University. My core expertise and interest are in environment-related issues. Apart from academics, I love music and exploring new places.