# Statistical tests in descriptive and analytical epidemiology

By Chandrika Kapagunta and Avishek Majumder on September 27, 2017

In the previous article, the importance of statistical analysis in epidemiological studies was established. Statistical analysis can contribute towards strategic planning of public health strategies. Consequently, this analysis is mainly done through mathematical and statistical techniques. Both offer unique advantages and purposes. Mathematical models study dynamics of a system, whereas statistical models focus on relationships between different variables. In general, epidemiological analysis can be divided into descriptive and analytical epidemiology. The classification depends upon the aim of the analysis (Fos, 2010). Furthermore, in this article, the properties and types of tests and models used in descriptive and analytical epidemiology is explained.

## Statistical tests in descriptive epidemiology

Descriptive epidemiology refers to analyzing the existing trends of a disease epidemic. The study is conducted with respect to time, place and persons (Aschengrau and Seage, 2013). The main aim of descriptive epidemiology is to evaluate the impact of a disease by analyzing trends in a population. This impact can be in the form of mortality and morbidity of host populations. Furthermore, it focuses on frequencies to determine rates of new diseases. Finally, it identifies patterns to isolate possible causal factors (Fos, 2010).

Three main variables studied under descriptive epidemiology are:

1. Person
2. Place
3. Time

Researchers use specific study designs to collect data for descriptive epidemiological studies. In addition, they provide information of place, time and persons affected by a disease. These study designs include (Merrill, 2015):

1. Ecological- Involves collecting aggregate data of a population/community group. Researchers compare risk factor prevalence with disease outcomes for that population.
2. Case study- Involves describing the disease for an individual or group using qualitative data.
3. Cross- sectional- Researchers measure multiple variables together for a single point in time for an given group/population.

## Common descriptive measures used in epidemiology

The most commonly used descriptive measures are (Ressing, Blettner and Klug, 2010):

1. Frequencies and comparisons- Frequencies include quantifying a variables in a population in the form of ‘epidemiological rate’. Rates are expressed in decimals, percentage or events or standard units of population. It is calculated by affected or non-affected divided by total study population. Comparisons involve studying the effect of determinants on disease frequencies. And this can be carried out over time or in different population groups.
1. Measures of association: Used for the study of the association between two variables with respect to a disease outcome or risk factor. Therefore, it leads to the development of a hypothesis. Common measures include the Pearson correlation coefficient, Coefficient of determination and Spearman’s rank correlation coefficient. After summarizing, the reports contain tables and charts to visually describe data. They show the trends or patterns of the population with respect to a variable. Several commonly used charts include:
• Bar charts
• Histogram
• Frequency polygon
• Epidemic curve
• Stem & leaf plot
• Bivariate scatter plot
• Spot map
• Line graph

## Classification of models under analytical epidemiology

The figure below shows models in epidemiology studies based on several parameters.

The above table shows the classification of different mathematical and statistical methods of analysis in epidemiology (Chubb and Jacobsen, 2010; Dimitrov and Meyers, 2010; Chen, 2014).

Type of Model Models used Purpose
##### Mathematical models
Classical mathematical models Compartmental models (E.g.: SIR, SIS, SITR, SEIR). When individuals of a population are segregated depending upon the state of infection (Susceptible, Infectious, Carrier etc.)
New mathematical models Contact network modelling and Agent-based simulations. High fidelity models, that incorporate secondary parameters like the movement of hosts, contact & interactions between them, age groups etc.
Spatiotemporal mathematical models Travelling- Wave model, Nearest-neighbor mixing model, Wavefront model. A complicated version of classical mathematical models, where both spatial and temporal data can be modelled together.
Statistical models
Probability models or stochastic models Binary probability model, Parameter estimation, Markov chain Monte Carlo methods, Bayesian models. When the model to be tested is dynamic and the output is a range of possible outcomes rather than a single outcome.
Spatial models Disease mapping, Disease clustering, Ecological analysis. When data is analysed based on locations of events or risks (national, state or district level).
Spatio- temporal statistical models Autologistic models, Latent structure LT models. When both spatial and temporal data is to be analysed for a disease to test of spatial distribution changes over a period of time.
Predictive modelling Time series or forecasting models, Regression models, Artificial neural network models. In order to predict future incidents or outcomes, prevalence or spread of disease across a population or geographical region.
Network-based Metapopulation based models, Network models. The central assumption is that diseases spread through human-human contact either through travelling, transportation or social contact networks.
Computational simulation models Cellular automation stimulation, Field stimulation modelling, Individual or Agent-based modelling. These models are used to stimulate infectious diseases spread in spatially structured in-silico environments.

These statistical methods are run on a wide range of commercial and open source tools and software. Most noteworthy ones are R, GIS (for spatial modelling), STATA and SPSS.

## Future Prospects in disease evaluation

Researchers are developing mathematical and statistical models to improve their disease evaluation properties. However, in future, the field needs model capable of better and faster detection of disease epidemics. They are important in controlling highly contagious diseases, especially viral. Researchers also need to develop real-time analysis models of an ongoing disease outbreak. It will help predict the extent of treatment facilities required. Moreover, disease prediction needs highly efficient and sophisticated forecasting models. Finally, models based on neural networks or machine learning would help real-time monitoring. Researchers use machine learning for continuous evolution of prediction values of disease outbreaks. Moreover, these models can also assist in predictions of the spread and emergence of the new pathogenic variant.

#### References

• Aschengrau, A. and Seage, G. R. (2013) Essentials of Epidemiology in Public Health. Jones & Bartlett Publishers.
• Chen, D. (2014) ‘Modeling the Spread of Infectious Diseases: A Review’, in Chen, D., Moulin, B., and Wu, J. (eds) Analyzing and Modeling Spatial and Temporal Dynamics of Infectious Diseases. First. John Wiley & Sons, Inc., pp. 19–42.
• Chubb, M. C. and Jacobsen, K. H. (2010) ‘Mathematical modeling and the epidemiological research process’, European Journal of Epidemiology, 25(1), pp. 13–19.
• Dimitrov, N. B. and Meyers, L. A. (2010) ‘Mathematical Approaches to Infectious Disease Prediction and Control’, in Risk and Optimization in an Uncertain World. INFORMS, pp. 1–25.
• Fos, P. J. (2010) Epidemiology Foundations: The Science of Public Health. John Wiley & Sons.
• Merrill, R. M. (2015) Introduction to Epidemiology. Seventh. Jones & Bartlett Publishers.
• Ressing, M., Blettner, M. and Klug, S. J. (2010) ‘Data analysis of epidemiological studies: Part 11 of a series on evaluation of scientific publications’, Deutsches Arzteblatt International, 107(11), pp. 187–192.
• Ughade, S. (2013) ‘Statistical modeling in epidemiologic research: Some basic concepts’, Clinical Epidemiology and Global Health, 1(1), pp. 32–36.