What is Structural Equation Modeling (SEM) analysis?

By Riya Jain & Abhinash on February 10, 2020

Structural Equation Modeling (SEM) analysis is a statistical method used in social science studies for testing the linkage between multiple variables at any point in time. SEM analysis is an extension of the simple linear model.

For example, there are many factors that affect the organizational commitment and the job satisfaction level of an employee. If the impact of organizational commitment and job satisfaction level on perceived performance need to be determined, then the usage of a simple linear model would be complex and difficult. Thus, SEM analysis help in the easy linkage of the factors and derivation of the results. The model formulated for stating the linkage is shown in the below figure.

Sample SEM Model — Figure 1: Sample **SEM** Model

Why use SEM analysis?

Simpler statistical testing methods like regression and factor analysis link the dependent and independent variables. However when there is a large number of causal variables present or when factors influencing independent variables are present, it becomes complicated. SEM analysis helps in studying the direct and indirect complex relationship between the causal variables in a single model.

By including measurable and non-measurable variables, SEM analysis determines the linkage between the variables. For example, psychological variables like intelligence level and consumer attitude towards a brand can be measured in numerical terms using a scale. In such cases, SEM analysis provides the opportunity to connect qualitative variables with the factors affecting them.

The terms to represent the different structures of linkage between the variables and to define the nature of variables, relationships, errors, and models are:

Non-measurable variables are referred to as latent variables,
Independent variables are referred to as exogenous variables,
Validation of a known relationship between the dependent and independent variables is considered a confirmatory factor analysis,
Error or business while computing the value of a non-measurable variable from the measurable variable is termed as measurement error, and
The linkage between the non-measurable variable is defined as a structural model.

The below figure explains each of these terms which are commonly used while having SEM analysis.

Main terminologies of SEM analysis — Figure 2: Main terminologies of **SEM** analysis

Assumptions of the SEM model which need to be fulfilled to derive accurate results

A large sample size is required for reliable results i.e. sample size should be at least 100.
Variables in the SEM programs are assumed to be continuous and normally distributed. However, in studying data obtained from the Likert scale like in the case of self-esteem though the data is not continuous, the variable is.
The relationship between the variables should be linear in nature.
There should not have outliers.
Multiple observed variables i.e. at least three should be included to measure each latent variable.
Multicollinearity should not be present in the model.
Missing data should be avoided. In having less than 5% of the total data as missing, deletion could be done but in case of having more than 5% of data as missing, the maximum likelihood estimation method should be adopted.

Visual representation

After discussing the meaning of each of the technical terms, this section provides a visual explanation of the terms.

Figure 3: Path Diagram (Malkanthie, 2015)

In the above figure ME1, ME2, ME3, ME4, and ME5 represent the measurement error; OV1, OV2, OV3, OV4, and OV5 are the observed variables; LV1 and LV2 show the latent variables, and RE1 depicts the residual error.

Furthermore, in the above figure, there are two measurement models and one structural model i.e.

Benefits of the SEM analysis

Compared to regression analysis and factors analysis, there are certain benefits of using SEM analysis i.e.

Test theoretical propositions via a non-experimental database i.e. the theoretical variables like job satisfaction of the employees or the attitude of the consumers could be compared by just including the factors which are affecting them. No experimental study is required for specifically deriving the numerical value of a non-measurable variable.
Include observed as well as latent variables for analysis i.e. SEM analysis helps in including the measurable and non-measurable variables in the model. For example, the impact of measurable factors like price, sales, or discount could be determined on the non-measurable factor i.e. customer attitude using SEM analysis.
Assumptions of the model are flexible i.e. problems like non-normal distribution, discontinuity in the database, and incomplete dataset could be handled by modifications.
Confirmatory factor analysis is done over exploratory factor analysis via SEM analysis. i.e. the focus of the analysis method is to validate the linkage between the variable instead of exploring which model is better suitable. Though, SEM analysis also helps in comparing the models and deriving the information about the better fit model but focuses on testing the reliability and validity of the model.
Results derived from the analysis are more reliable due to the inclusion of measurement error. As there is a large possibility of error while computing the value of a factor in social science, thus usage of the confirmatory factor analysis method in SEM analysis helps in including measurement error for each computed variable to reduce biases.
Simultaneously model could be studied even in the presence of multiple dependent variables. As a single linear or multiple regression analysis equations only provide the facility for testing the relationship of independent variables with only one dependent variable. The process needs to be repeated many times in case of more than one dependent variable and even in case of a large number of independent variables the analysis becomes complicated. SEM analysis overcomes these complexities of regression analysis and helps in running the models with a large number of dependent and independent variables at the same time.
SEM helps in testing the overall fitness of the model even in the presence of multiple relationships. As often the information about the accuracy and validity of the model is difficult to derive in the case of multiple linkages i.e. in the presence of various factors or mediating variables. Thus, SEM analysis by providing the values of absolute fitness, incremental fitness, and parsimonious fitness indexes helps in the verification of the model accuracy.
The simultaneous comparison could be done based on mean, variance, or regression coefficients i.e. as the SEM model enables the analysis of different models at the same time, this comparison between the results of different models could be easily done.

Limitations of the SEM analysis

Despite being a powerful statistical analysis technique, there are certain disadvantages to the usage of SEM analysis. Issues raised in having an SEM analysis are listed below:

The focus of the model is on the prediction of the causal relationship between the variables i.e. derivation of the hypothesis testing result. The non-experimental data stating the correlation between the variables are converted into the model representing causal linkage.
Numerous modifications could be done in order to attain the appropriate value of index i.e. NNFI or chi-square.
Though the assumptions are flexible but still the technicality and complexity of the assumptions prohibit the first-generation usage of SEM analysis.

Software packages that support SEM analysis

Software packages under which SEM analysis could be performed are:

AMOS (commonly used)
LISREL (Traditional method)
EQS (used with a small sample)
Mplus (used in case sample is not enough)
R
Mx
SEPATH
CALIS

References

Malkanthie, A. (2015) ‘The Basic Concepts of Structural Equation Modeling’, Lap Lambert Academic Publishing, 1(January), p. 55. doi: 10.13140/RG.2.1.1960.4647.