# Statistical tests in descriptive and analytical epidemiology

In the previous article, the importance of statistical analysis in epidemiological studies was established. Statistical analysis can contribute towards strategic planning of public health strategies. Consequently, this analysis is mainly done through mathematical and statistical techniques. Both offer unique advantages and purposes. Mathematical models study dynamics of a system, whereas statistical models focus on relationships between different variables. In general, epidemiological analysis can be divided into descriptive and analytical epidemiology. The classification depends upon the aim of the analysis (Fos, 2010). Furthermore, in this article, the properties and types of tests and models used in descriptive and analytical epidemiology is explained.

## Statistical tests in descriptive epidemiology

Descriptive epidemiology refers to analyzing the existing trends of a disease epidemic. The study is conducted with respect to time, place and persons (Aschengrau and Seage, 2013). The main aim of descriptive epidemiology is to evaluate the impact of a disease by analyzing trends in a population. This impact can be in the form of mortality and morbidity of host populations. Furthermore, it focuses on frequencies to determine rates of new diseases. Finally, it identifies patterns to isolate possible causal factors (Fos, 2010).

Three main variables studied under descriptive epidemiology are:

1. Person
2. Place
3. Time

Researchers use specific study designs to collect data for descriptive epidemiological studies. In addition, they provide information of place, time and persons affected by a disease. These study designs include (Merrill, 2015):

1. Ecological- Involves collecting aggregate data of a population/community group. Researchers compare risk factor prevalence with disease outcomes for that population.
2. Case study- Involves describing the disease for an individual or group using qualitative data.
3. Cross- sectional- Researchers measure multiple variables together for a single point in time for an given group/population.

## Common descriptive measures used in epidemiology

The most commonly used descriptive measures are (Ressing, Blettner and Klug, 2010):

1. Frequencies and comparisons- Frequencies include quantifying a variables in a population in the form of ‘epidemiological rate’. Rates are expressed in decimals, percentage or events or standard units of population. It is calculated by affected or non-affected divided by total study population. Comparisons involve studying the effect of determinants on disease frequencies. And this can be carried out over time or in different population groups.

Types of quantifying and comparison measures (Source: Ressing, Blettner and Klug, 2010)

1. Measures of association: Used for study of association between two variables with respect to a disease outcome or risk factor. Therefore, it leads to development of a hypothesis. Common measures include, Pearson correlation coefficient, Coefficient of determination and Spearman’s rank correlation coefficient. After summarizing, the reports contain tables and charts to visually describe data. They  show the trends or patterns of the population with respect to a variable. Several commonly used charts include:
• Bar charts
• Histogram
• Frequency polygon
• Epidemic curve
• Stem & leaf plot
• Bivariate scatter plot
• Spot map
• Line graph

## Classification of models under analytical epidemiology

The figure below shows models in epidemiology studies based on several parameters.

Parameters for model selection in Epidemiology studies (Source: Ughade, 2013)

The above table shows classification of different mathematical and statistical methods of analysis in epidemiology (Chubb and Jacobsen, 2010; Dimitrov and Meyers, 2010; Chen, 2014).

### Purpose

#### Mathematical models

Classical mathematical models Compartmental models (E.g.: SIR, SIS, SITR, SEIR). When individuals of a population are segregated depending upon the state of infection (Susceptible, Infectious, Carrier etc.)
New mathematical models Contact network modeling and Agent-based simulations. High fidelity models, that incorporate secondary parameters like movement of hosts, contact & interactions between them, age groups etc.
Spatio- temporal mathematical models Travelling- Wave model, Nearest-neighbor mixing model, Wave- front model. A complicated version of classical mathematical models, where both spatial and temporal data can be modeled together.
Statistical models
Probability models or stochastic models Binary probability model, Parameter estimation, Markov chain Monte Carlo methods, Bayesian models. When the model to be tested is dynamic and the output is a range of possible outcomes rather than a single outcome.
Spatial models Disease mapping, Disease clustering, Ecological analysis. When data is analysed based on locations of events or risks (national, state or district level).
Spatio- temporal statistical models Autologistic models, Latent structure LT models. When both spatial and temporal data is to be analysed for a disease to test of spatial distribution changes over a period of time.
Predictive modeling Time series or forecasting models, Regression models, Artificial neural network models. In order to predict future incidents or outcomes, prevalence or spread of disease across a population or geographical region.
Network- based Metapopulation based models, Network models. Central assumption is that diseases spread through human-human contact either through travelling, transportation or social contact networks.
Computational stimulation models Cellular automation stimulation, Field stimulation modeling, Individual or Agent-based modeling. These models are used to stimulate infectious diseases spread in spatial structured in-silico environments.

These statistical methods are run on a wide range of commercial and open source tools and software. Most noteworthy ones are R, GIS (for spatial modeling), STATA and SPSS.

## Future Prospects in disease evaluation

Researchers are developing mathematical and statistical models to improve their disease evaluation properties. However, in future, the field needs model capable of better and faster detection of disease epidemics. They are important in controlling highly contagious diseases, especially viral. Researchers also need to develop real-time analysis models of an ongoing disease outbreak. It will help predict the extent of treatment facilities required. Moreover, disease prediction needs highly efficient and sophisticated forecasting models. Finally, models based on neural networks or machine learning would help real-time monitoring. Researchers use machine learning for continuous evolution of prediction values of disease outbreaks. Moreover, these models can also assist in predictions of spread and emergence of new pathogenic variant.

### References

• Aschengrau, A. and Seage, G. R. (2013) Essentials of Epidemiology in Public Health. Jones & Bartlett Publishers.
• Chen, D. (2014) ‘Modeling the Spread of Infectious Diseases: A Review’, in Chen, D., Moulin, B., and Wu, J. (eds) Analyzing and Modeling Spatial and Temporal Dynamics of Infectious Diseases. First. John Wiley & Sons, Inc., pp. 19–42.
• Chubb, M. C. and Jacobsen, K. H. (2010) ‘Mathematical modeling and the epidemiological research process’, European Journal of Epidemiology, 25(1), pp. 13–19.
• Dimitrov, N. B. and Meyers, L. A. (2010) ‘Mathematical Approaches to Infectious Disease Prediction and Control’, in Risk and Optimization in an Uncertain World. INFORMS, pp. 1–25.
• Fos, P. J. (2010) Epidemiology Foundations: The Science of Public Health. John Wiley & Sons.
• Merrill, R. M. (2015) Introduction to Epidemiology. Seventh. Jones & Bartlett Publishers.
• Ressing, M., Blettner, M. and Klug, S. J. (2010) ‘Data analysis of epidemiological studies: Part 11 of a series on evaluation of scientific publications’, Deutsches Arzteblatt International, 107(11), pp. 187–192.
• Ughade, S. (2013) ‘Statistical modeling in epidemiologic research: Some basic concepts’, Clinical Epidemiology and Global Health, 1(1), pp. 32–36.

### Chandrika Kapagunta

Research Analyst at Project Guru
Chandrika is a nature enthusiast with special love for the marine world. Her Master’s degree in Marine Biotechnology and Scuba Diving experience has made her a strong advocate of environment and marine conservation, especially through bioremediation. She believes in finding solutions of everyday human problems in nature, be it medicines, technology or philosophy. Having worked as a volunteer at The Bombay Natural History Society and as a Senior Research Fellow at Central Institute of Fisheries Education, she has had exposure to the current state of the academic research, specifically in the field of environmental biotechnology.

Discussions