# Time series and forecasting models in disease epidemiology

By Chandrika Kapagunta and Avishek Majumder on September 25, 2017

Time series analysis refers to the analysis of observations that are time-dependent. Therefore, observations from an event are dependent upon the time at which it took place. Time intervals can be minutes, hours, days, months or years. Observing the trends of these events over a long period enables identifying hidden relationships. Moreover, future trends can be predicted using this analysis. Time series models that are related to predicting future disease trends are known as forecasting models. In epidemiology studies, forecasting is important to understand disease spread over a period of time. Forecasting models also help detect future epidemics using related factors like environment, vector density or socioeconomic factors. In this article, the role of forecasting models in epidemiology is explained. Also, different types of statistical models available have been explored.

## Forecasting in epidemiology research

Temporal heterogeneity often occurs in the form of seasonal variations of epidemics. It arises as a result of meteorological conditions, pathogen prevalence or virulence and host behaviour (Dowell, 2001). Temporal analysis can be done using time series methods. It is important to study these temporal trends and identify variables associated with variations. This is because accurate prediction of future trends can be done using time series analysis. Time series methods involve a collection of some form of quantitative measurements at regular intervals. Researchers gather this information through repeated observation (Merrill, 2009). Time series analysis, therefore, is dependent on the time period of study to deduce trends.

During an epidemic, time series analysis can be conducted on an individual scale (longitudinal data) as well as at group level (ecologic data). While longitudinal data involves studying the same group of people over a consequent interval, ecologic data consists of grouped or aggregated data over geographical areas over a period of time (Wakefield, 2008). Both types of data are useful in epidemiology, depending upon the research aims.

Forecasting analysis based on time series data does not simply describe existing time-dependent trends. Rather, it helps to extrapolate and predict future trends. Moreover, to predict future disease trends, time series models are advantageous over mechanistic models. This is due to highly specific epidemiological information needed to fit mechanistic models. Time series models useless information and consider seasonal trends. They also consider rapid fluctuations showcased by diseases, unlike mechanistic models (Zhang et al., 2013). The main types of parameters studied during forecasting are:

• demographic data of host populations
• socioeconomic data of host populations
• climate
• vector properties
• infrastructure data like vector breeding, health facility access, sanitation etc. and
• pathogen variants circulating within a population (Gharbi et al., 2011).

## Role of forecasting in disease epidemiology

Forecasting analysis in epidemiological studies serves multiple purposes, as shown in the figure below.

Analysing temporal changes in disease patterns, spread and related factors help in predicting their impact on the timely progress of a disease. The most important role played by forecasting models in epidemiology studies is in decision making. The lag phase of an epidemic and subsequent seasonal fluctuations are useful in this regard (Zhang et al., 2013).  Forecasting models in epidemiology require details on the host, vector populations and host-pathogen interactions. Therefore, they require data on dynamics that increase pathogen multiplication and spillover in human populations (LaDeau et al., 2011). Forecasting models also help in predicting and planning health infrastructure. Thus, decision makers can prepare by anticipating the number of doctors, hospitals and beds required. Health provisions can also be distributed to the interior or rural regions of a country.

## Types of forecasting models

Subsequently, forecasting models are of several types. It depends upon how the data is treated, such as smoothing methods, decomposition methods and Box-Jenkins. All of these are applied in univariate and multivariate modelling (Yaffee and McGee, 2000). Traditionally, in epidemiology, all of these time series analysis methods have been applied. The five major types of models applied in studying epidemic data for forecasting disease progress and outcomes are given in the table below.

No. Model Details
Stationary Time Series
1 Autoregressive model (AR) Current values expressed linearly based on previous values and current residuals
2 Moving Average (MA) Current values of time series expressed linearly based on previous values and residuals of the time series
3 Autoregressive Moving Average (ARMA) As a combination of AR and MA, current values of time series expressed linearly based on current values but also previous and current residuals
Non-Stationary Time Series
4 Autoregressive Integrated Moving Average (ARIMA) Based on ARMA model, but a differencing process converting non-stationary data to stationary data
5 Seasonal Autoregressive Integrated Moving Average (SARIMA) Based on the ARIMA model, but also includes seasonal differencing, in case of data has periodic patterns

Types of Forecasting models adopted in Epidemiology (Zhang et al.,2013)

These models help predict future trends of outcomes, risks and distribution or spread patterns of diseases like Malaria, Ebola, Influenza and Dengue, among other infectious diseases (Zhang et al., 2013).

## Further uses of forecasting techniques in epidemiology

While forecasting helps predict population trends related to a disease, it can also be used to predict extreme events. Examples of these events are community epidemic, accidental exposure to toxins and extreme weather like storms or cyclones (Lerch, 2012). In such cases, probabilistic models are popular due to reliability and applicability in unique and extreme data. This is explained more in the table above.

These methods include Quantile Regression Models (QRM) and Fractional Polynomial Models (FPM). QRM is involved in determining relationships between dependent variables (disease outcome, mortality, affected individuals etc.) and predictor variables. On the other hand, FPM is useful in studying specific groups of populations (Soyiri and Reidpath, 2013). Forecasting models used for other purposes (Lerch, 2012):

#### Probabilistic models

1. Quantile regression models (QRM).
2. Fractional or multi-step polynomial transformation models (FPM).

#### Spatio- temporal models

1. Bayesian spatio-temporal models.
2. Generalised linear mixed models.

#### Artificial neural network models

1. Back-propagation neural networks (BPNN).
2. Radial Basis Function Neural Networks (RBFNN).
3. Elman recurrent neural networks (ERNN).
4. Fuzzy neural network algorithms.

## Spatio-temporal analysis a growing forecasting method

Epidemiological research also requires studying spatial data related to an epidemic or pandemics. This is because distribution patterns of epidemics can help predict populations that may be affected in the future. Such prediction depends upon covariate information like climate and environmental factors, pathogen transmission patterns and human movements (Kottas, Duan and Gelfand, 2008). There is a growing importance of Geographical Information System (GIS) technology and spatial information in contemporary researches. Therefore, forecasting models based on spatiotemporal analysis are popular in disease prediction. These models help analyse spatial and temporal patterns and also identify high-risk clusters. Moreover, they enable timely and targeted intervention strategies in specific areas. The most commonly used technique is the Bayesian spatiotemporal models. They aim to identify smoothened maps of disease risks and prevalence based on spatiotemporal autocorrelations. Generalised Linear Mixed Models are also popular (Lowe et al., 2013; Lee and Lawson, 2014).

## Use of artificial neural networks in disease prediction

Sometimes epidemiology data have non-linear relationships. In such cases, models based on artificial neural networks can be highly useful in extracting information. Also, neural networks based models have been known to possess robustness and are highly adaptive in learning and are also capable of tolerating faults (Wang, Wang and Su, 2011). Another advantage of using neural networks is the continuous nature of biosurveillance data. It requires a model to be continuously adaptable to new parameters or trends (Wahyunggoro, Permanasari and Chamsudin, 2013). Common neural networks used in disease predictions include the following:

• Back-propagation neural networks (BPNN).
• Radial Basis Function Neural Networks (RBFNN).
• Elman recurrent neural networks (ERNN).

Fuzzy neural network algorithms (Wahyunggoro, Permanasari and Chamsudin, 2013; Zhang et al., 2013).

## Need for continuous modification in forecasting methods

Forecasting models based on time series data are highly advantageous in public health policy decision making and future risk prediction. Although a wide range of forecasting models are available, the Box-Jenkins based methods are the most commonly used. However, with increasing size of data being collected, existing models need modification to fit the data. With increasing frequencies of emergent diseases, disease event prediction is required to be more accurate and robust as possible. Models based on artificial neural networks possess significant scope in analyzing this continuously evolving data. They can help researchers predict future events real-time.

#### References

• Dowell, S. F. (2001) ‘Seasonal variation in host susceptibility and cycles of certain infectious diseases’, Emerging Infectious Diseases, 7(3), p. 369.
• Gharbi, M. et al. (2011) ‘Time series analysis of dengue incidence in Guadeloupe, French West Indies: Forecasting models using climate variables as predictors’, BMC Infectious Diseases, 11(1), p. 166.
• Kottas, A., Duan, J. A. and Gelfand, A. E. (2008) ‘Modeling disease incidence data with spatial and spatio temporal Dirichlet process mixtures.’, Biometrical Journal, 50(1), pp. 29–42.
• LaDeau, S. L. et al. (2011) ‘Data–model fusion to better understand emerging pathogens and improve infectious disease forecasting’, Ecological Applications, 21(5), pp. 1443–1460.
• Lee, D. and Lawson, A. (2014) Cluster detection and risk estimation for spatio-temporal health data.
• Lerch, S. (2012) Verification of probabilistic forecasts for rare and extreme events. University of Heidelberg.
• Lowe, R. et al. (2013) ‘The development of an early warning system for climate-sensitive disease risk with a focus on dengue epidemics in Brazil’, Statistics in Medicine, 32(5), pp. 864–883.
• Merrill, R. (2009) Environmental Epidemiology: Principles and Methods. Jones & Bartlett Publishers.
• Soyiri, I. N. and Reidpath, D. D. (2013) ‘An overview of health forecasting’, Environmental Health and Preventive Medicine, 18(1), pp. 1–9.
• Wahyunggoro, O., Permanasari, A. E. and Chamsudin, A. (2013) ‘Utilization of Neural Network for Disease Forecasting’, in 59th ISI World Statistics Congress, pp. 549–554.
• Wakefield, J. (2008) ‘Ecologic Studies Revisited’, Annual Review of Public Health, 29, pp. 75–90. Available at: http://faculty.washington.edu/jonno/papers/annrev08.pdf.
• Wang, Z., Wang, F. and Su, S. (2011) ‘Solar Irradiance Short-Term Prediction Model Based on BP Neural Network’, Energy Procedia, 12, pp. 488–494.
• Yaffee, R. A. and McGee, M. (2000) ‘Introduction to Box-Jenkins Time Series Analysis’, in ntroduction to Time Series Analysis and Forecasting: With Applications of SAS and SPSS, pp. 69–100.
• Zhang, X. et al. (2013) ‘Comparative Study of Four Time Series Methods in Forecasting Typhoid Fever Incidence in China’, PloS One, 8(5), p. e63116.