Understanding the foundations of structural equation modeling
Structural equation modeling (SEM) is a powerful multivariate technique used increasingly in scientific investigations for evaluating and testing the relationships between variables. It is different from other modeling methods such as regression and factor analysis. SEM helps in determining the direct and indirect effects on the relationship between variables. Although it is nearly 100 years old, the methodology has evolved significantly. A number of new parameters or indicators have been added to the model. In order to understand what they represent, it is essential to first understand the meaning and terminologies of these values.
The previous article introduced the concept of SEM analysis, its meaning, relevance, and benefits. However, it is also important to understand different terminologies that form the basics of SEM analysis.
Basic terminologies of a structural equation modeling
Structural equation modeling (SEM) can be applied in LISREL and SPSS AMOS software. Irrespective of the software in use, the figures and values that emerge remain the same. The main terminologies that are used while formulating a model and analyzing the relationship between variables are stated below.
1. Measured or Observed variable
This variable is the observations that are measured i.e., a variable’s numeric data which is available and can be extracted from other sources for statistical analysis. The measured variable in a model is represented by a rectangle (USF, 2021).
2. Latent variable
A latent variable in the model is the observation value of that variable for whose data is not available in numeric form. With the inclusion of this non-measured variable, the SEM model is developed. Latent variables in the model are represented by oval shapes (Kenny, 2011; USF, 2021).
3. Measurement error
Measurement error is regarded as the error or biases present while measuring the value of a variable (i.e. in collecting the data for any variable). Thus, the difference in correlation value and the actual value of 1.00 is the measurement error. There is always a possibility of error in measured or observed variables. This is called ‘measurement error’. On the other hand, the latent variable does not have a measurement error because it does not have any numeric value. The value just is derived by the model while computation and not measured by the researcher (Kenny, 2011).
4. Unstandardized estimate
These estimates are the values that maintain the scaling information of variables and can be interpreted with reference to scale. Thus, while developing the linkage between the variables and assessing the impact, and unstandardized estimate value is used (Suhr, 2006).
5. Standardized estimate
Standardized estimates are the values whose mean is 0 and variance is 1 i.e. transformation of unstandardized estimates is done for removing the scaling information from the model. It is essential to remove scaling because in some cases, the dataset for independent variables is computed at different scales. Unevenness in scaling increases the chances of error and reduces the probability of efficient results. Thus, standardized estimates could be used for measuring effect size but majorly it is used for informal comparisons of parameters (Kenny, 2011; Suhr, 2006). However, for the models with all independent variables computed at one scale, standardization is not essential.
6. Endogenous variable
The variable whose values is dependent on one or more variable in the model is said to be the endogenous variable. In more common terms, an endogenous variable is also stated as a dependent or response variable (Kenny, 2011).
5. Exogenous variable
These variables are the ones whose value is already determined and are responsible for determining the values of other variables. They could also be stated as independent or predictor variables (Kenny, 2011).
6. Direct effect
It is the directional relationship between two variables i.e. the impact of one variable on another which is majored by the direction of the arrow. It consists of independent and dependent variables (Suhr, 2006).
7. Indirect effect
The indirect effect is the effect of one independent variable on another variable by having the inclusion of one or more intervening variables. It is an effective measure for building the mediating relationship between variables (Suhr, 2006).
8. Factor loading
The factor loading is the estimate of indicator variance amount that is accounted for latent construct. As it helps in measuring the linkage of observed variables with latent construct, thus, it is often recommended to have the inclusion of variables with a loading value of more than 0.4, however, a lower value could also be considered as it represents the association of variables. As the factor loading is based on the association of one variable with another, thus, it could be positive or negative depicting the nature of the association (Meyer, 2020).
9. Covariance
The covariance between the two variables is the correlation times product of the variable’s standard deviations. It is represented by a two-sided arrow wherein the covariance of a variable with itself depicts the variable variance (Kenny, 2011).
10. Just-identified model
It is also known as the saturated model. This is because the number of unknowns (the variables) is exactly equal to the number of equations i.e. model has zero degrees of freedom, hence the model is just identified. Herein, the value of the free parameter could be obtained by just one manipulation to observed data showing the presence of a unique solution i.e. x + y = 5 and 2x + y = 8 (Kenny, 2011; Suhr, 2006). This is an ideal model.
11. Over-identified model
The model wherein parameters are identified and the number of unknown parameters (variables) is less than the number of equations, thus, making the model overidentified. Herein, as there are more equations than unknown variables, thus, there is no exact solution i.e. x + y = 6, 2x + y = 9, and x + 2y = 10. This is not an ideal model. Though the model is overidentified an advantage for this is that the model still can be tested for fitness (Kenny, 2011; Suhr, 2006).
12. Under-identified model
A model for which a single unique value could not be determined or estimated from the observed data for one or more free parameters included in the model. Herein, some parameters remain unidentified. For example, in the case of x + y = 6; the value of one variable is totally dependent on another variable thus, as unknowns are more than the equation hence model is under-identified (Kenny, 2011; Suhr, 2006). This is not an ideal model. Therefore, it must be rejected and no further action can be performed on it.
Visual representation of the foundations of SEM
As the relevance of structural equation modeling (SEM) is growing, and many times due to technicality associated with each symbol means, the formulation of an effective model becomes complicated. Thus, herein, major terminologies of an SEM model are represented diagrammatically for inferring more adequate information about them. However, the below figure is just the picturization of the foundations and not the statements of the path diagram as it could vary based on the research requirement and variables.
The above figure shows that the variables OV1 to OV6 are observed variables while LV1 to LV3 are the latent variable. Herein, e1 to e6 are the measurement error while LV1 is the exogenous variable and LV3 is the endogenous variable for the model and LV2 is the mediating variable thus, the impact from LV1 on LV3 is a direct effect while the impact of LV1 on LV3 via LV2 is an indirect effect. All one-sided arrows depict the one-sided relationship while the two-sided arrow i.e. from e1 to e2 state the covariance.
The models can be of different types
Structural equation modeling over time has grown to be a relevant methodology in building in the linkage between variables and presenting a more visual outlook of the relationship. With the examination of key terminologies like a latent variable, observed variable, measurement error, covariance, or estimates; the basic understanding of the SEM working is derived.
References
- Kenny, D. A. (2011). Terminology and Basics of SEM. Constraints, 1–4.
- Meyer, J. (2020). First Steps in Structural Equation Modeling: Confirmatory Factor Analysis – The Analysis Factor. The Analysis Factor. https://www.theanalysisfactor.com/sem-first-step-confirmatory-factor-analysis-2/
- Suhr, D. (2006). The Basics of Structural Equation Modeling. At University of Northern Colorado. https://doi.org/10.1007/s007840050036
- USF. (2021). Structural equation modeling (SEM). https://doi.org/10.4324/9781315814919-21
I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.
Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here.
I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.
I am a Senior Analyst at Project Guru, a research and analytics firm based in Gurugram since 2012. I hold a master’s degree in economics from Amity University (2019). Over 4 years, I have worked on worked on various research projects using a range of research tools like SPSS, STATA, VOSViewer, Python, EVIEWS, and NVIVO. My core strength lies in data analysis related to Economics, Accounting, and Financial Management fields.
Discuss