Systematic review of forecasting models in disease epidemiology

In the previous article, the role and advantages of using forecasting models in disease epidemiology was discussed. Forecasting models are important tools assisting in public health decision making. Future disease trends, incidents and possible risks within a population can be assessed with these models. As discussed previously, many models are used to analyse time series data in epidemiology. In this article, a systematic review has been presented on several such models used in the past by researchers and institutions.Furthermore, predicting infectious disease events and patterns was possible because of these models. They were applied on multiple diseases caused by different pathogens like bacterial, viral, fungal or parasitic. Also, some of these diseases involved vectors that accelerate further transmission.

Selection of time series analysis model in epidemiology

Forecasting models are selected based on data properties, variables to be studied and also the aims and objectives of the study. Comparison of different models can help in determining the appropriate model for specific diseases (Zhang et al. 2013). Four main groups of forecasting models are generally used to study different infectious diseases. They have been reviewed ahead.

Time series analysis models used for forecasting in epidemiology

Time series analysis models used for forecasting in epidemiology

Models based on Box-Jenkins methods

No.

Forecasting Model

Disease and Geographical Region

Variables studied

Results

1 Moving Averages of Mixed Generalised Additive Model (MGAM) (Ma et al. 2013). Bacillary Dysentery (BD) (Shanghai, China) Daily meteorological data and BD case counts.
  • Temperature significantly linearly associated with logarithmic BD count between 12-22°C.
  • Predictive model showed good fitness with R2 of internal data at 0.875.
  • Finally, Prediction effect on external data with correlation coefficient of 0.859.
2 ARIMA (Dom et al. 2013) Dengue (Subang Jaya, Malaysia) Dengue incidence, climate variables.
  • ARIMA(2,0,0)(0,0,01)52 was the best model with weekly variations.
  • It could predict efficiently 4 weeks ahead.
  • Also, performance of the model increased when climate variables included as external regressors.
3 Univariate SARIMA (Moosazadeh et al. 2014) Tuberculosis (Iran) Tuberculosis cases (monthly) per 100,000 population.
  • An average of 756.8(SD = 11.9) cases of Tuberculosis detected per month.
  • Among four models, SARIMA (0,1,1)(0,1,1)12 showed lowest AIC (12.78).
  • This model predicted 16.75 cases per 100,000 people in 2014.
4 ARIMA (Wang et al. 2017) Influenza (Ningbo, China) Cases of Influenza-like-Illness, climate variables.
  • ARIMA(1,1,1)(1,1,0)12 was the best model fitting existing data.
  • Moreover Influenza rates in Ningbo was found to peak twice a year correlated to rains or cold.

Systematic review of researches using Box-Jenkins methods in epidemiology

As seen from the studies for most diseases, climate variables (humidity and temperature) were important in seasonal trends of the disease distribution. Both ARIMA and SARIMA can be used for effective prediction of disease incidents. However, SARIMA is more advantageous in case of inherent seasonal trends of a disease. This is because it gives a more accurate prediction rate.

Models based on Probabilistic methods

No.

Forecasting Model

Disease and Geographical Region

Variables studied

Results

1 Multi-step Polynomial Transformation (Chatterjee & Sarkar 2009). Malaria (Chennai, India) Slide Positive Rates (SPR) value, P.vivax deaths, temperature, humidity, rainfall.
  • High prediction power of model in predicting slide positivity rates and P.vivax deaths.
  • Climate variables, disease incidence at zonal levels both influence prediction.
  • Lastly, long term forecasting is efficient.
2 Multivariate Time Series model based on Monte Carlo simulations (Held et al. 2017). Norovirus Gastroenteritis (Berlin, Germany) Weekly counts of infection, Age, District.
  • Best model fitted included age-structured data with social contact data.
  • Model 4 shows best final size, long term prediction curve.
3 Markov model along with Monte Carlo simulation (Rein et al. 2011). Hepatitis C (USA) Demographic data, Prevalence estimates of Full range of Hepatitis C disease state.
  • Estimated death due to HCV higher than reported deaths by 12.7%.
  • According to forecast, HCV cases will peak between 2030-2035 and decline after 2060.
  • End-stage liver disease cases at 38,600 in 2030.

Systematic review of researches using probabilistic methods in epidemiology

Probabilistic models are useful in disease prediction in situations of limited data or hidden relationships. Furthermore, forecast values should have attached uncertainty (Held et al. 2017). By using probabilistic models, the inherent difficulty in estimating disease epidemics can be addressed when probability rates are attached to the final predicted values.

Models based spatio-temporal analysis methods

No.

Forecasting Model

Disease and Geographical Region

Variables studied

Results

1 Generalised Linear Mixed Model (Lowe et al. 2013). Dengue (South East Brazil) Notified Dengue fever counts per month, national cartographic data and levels of urbanization, climate and Oceanic Nino index.
  •  Successful epidemic alerts can be issued for 81% of 54 regions.
  • Predictions possible several months in advance.
2 Spatio-temporal hierarchical Bayesian model (Lowe et al. 2014) Dengue (Brazil) Confirmed dengue cases, Demographic density, urban population, monthly precipitation, temperature and altitude.
  • Different parts of Brazil regions had varying levels of risks.
  • Low-Medium level risk for host cities of World Cup.
  • Model allowed for prediction 3 months in advance.
3 stsSEIR model (Lai et al. 2015) H1N1 (Hong Kong) Daily influenza cases, demographic data of patients, Population and Land usage data.
  • Immediate forecasts values (1-2 days) more sensitive than extended forecasts (6-7 days).
  • R2 value of 1-2 days forecast values higher than 6-7 days values.
  • Model predicted better for some areas of Hong Kong over others.

Systematic review of researches using spatio-temporal methods in epidemiology

Spatio-temporal prediction models mainly allow for future prediction. Moreover, they also show probable high risk areas in the future epidemics. These models are based on previous trends of incidents and climate factors. Thus, they allow for better prediction of disease incidents or outcomes.

Models based on artificial neural networks

No.

Forecasting Model

Disease and Geographical Region

Variables studied

Results

1 Support Vector Machine- Firefly Algorithm model (SVM-FFA) (Ch et al. 2014). Malaria (Bikaner and Jodhpur, India) Malaria incidences, climate data like rainfall, temperature and humidity.
  • SVM-FFA model more accurate than ARMA, ANN and SVM alone.
  • Fit value, Normalized Mean Square Error values are lowest for SVM at 0.13.
  • SVM-FFA model with incident rates and climate variables best.
2 Hybrid model of Grey Model (GM) and Back Propagation Artificial Neural Networks (BP-ANN) (Gan et al. 2015). Hepatitis B

(China)

Hepatitis B Incident rates.
  • Prediction by proposed model more accurate than GM models.
  • Relative error smallest for proposed model.

 

3 Back Propagation Artificial Neural Networks (BP-ANN)  (Pezeshki et al. 2016). Cholera (Chabahar District, Iran) Monthly and Seasonal average values of Cholera incidents and Climate variables (temperature, humidity, rainfall), distance from border and health centres.
  • Best model trained with climate and spatial data.
  • Optimized model predicted accurately predicted 80% in 100 villages with 44.4% specificity.
4 Machine Learning (ML) Pipeline based on Artificial Neural Network and Support Vector Machine (Colubri et al. 2016). Ebola Suspected/Positive Ebola cases, Clinical data, Laboratory data and Viral load of patients.
  • ML based prediction model useful in prognostic prediction of Ebola patients undergoing treatment.
  • Several Clinical and Laboratory symptoms can predict patient prognosis.

Systematic review of researches using artificial neural networks methods in epidemiology

Artificial neural networks work ideally in case of limited data and high ambiguity. They focus on uncertainty problems. This is because ANNs are useful in non-linear statistical modeling. Furthermore, ANN based prediction models can be trained upon the existing data to derive the best model.

Choosing the right model in epidemiology

Overall, systematic review of different types of forecasting models for predicting disease epidemic distribution and outcomes was done. These models have their own inherent characteristics. They can be used depending upon the properties of the data, disease and aim. It is important that while developing or applying any of the model, the limitations are discussed. Limitations help in deriving better inferences. Although ideal forecasting models do not exist, but models that can efficiently predict future outcomes. Furthermore, distribution patterns are needed to be developed for better control and prevention strategies.

References

  • Ch, S. et al., 2014. A Support Vector Machine-Firefly Algorithm based forecasting model to determine malaria transmission. Neurocomputing, 129, pp.279–288.
  • Chatterjee, C. & Sarkar, R.R., 2009. Multi-Step Polynomial Regression Method to Model and Forecast Malaria Incidence. PloS One, 4(3), p.e4276.
  • Colubri, A. et al., 2016. Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients. PLoS Neglected Tropical Diseases, 10(3), p.e0004549.
  • Dom, N.C. et al., 2013. Generating temporal model using climate variables for the prediction of dengue cases in Subang Jaya, Malaysia. Asian Pacific Journal of Tropical Disease, 3(5), pp.352–361.
  • Gan, R. et al., 2015. Application of a hybrid method combining grey model and back propagation artificial neural networks to forecast hepatitis B in China. Computational and Mathematical Methods in Medicine, 2015, p.ID 328273.
  • Held, L., Meyer, S. & Bracher, J., 2017. Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture. Statistics in Medicine.
  • Lai, P.C. et al., 2015. An early warning system for detecting H1N1 disease outbreak–a spatio-temporal approach. International Journal of Geographical Information Science, 29(7), pp.1251–1268.
  • Lowe, R. et al., 2014. Dengue outlook for the World Cup in Brazil: an early warning model framework driven by real-time seasonal climate forecasts. The Lancet Infectious Diseases, 14(7), pp.619–626.
  • Lowe, R. et al., 2013. The development of an early warning system for climate-sensitive disease risk with a focus on dengue epidemics in Brazil. Statistics in Medicine, 32(5), pp.864–883.
  • Ma, W. et al., 2013. Applied Mixed Generalized Additive Model to Assess the Effect of Temperature on the Incidence of Bacillary Dysentery and Its Forecast. PloS One, 8(4), p.62122.
  • Moosazadeh, M. et al., 2014. Forecasting Tuberculosis Incidence in Iran Using Box-Jenkins Models. Iranian Red Crescent Medical Journal, 16(5), p.e11779.
  • Pezeshki, Z. et al., 2016. Model of Cholera Forecasting Using Artificial Neural Network in Chabahar City, Iran. International Journal of Enteric Pathogens, 4(1), pp.23–30.
  • Rein, D.B. et al., 2011. Forecasting the morbidity and mortality associated with prevalent cases of pre-cirrhotic chronic hepatitis C in the United States. Digestive and Liver Disease, 43(1), pp.66–72.
  • Wang, C. et al., 2017. Epidemiological Features and Forecast Model Analysis for the Morbidity of Influenza in Ningbo, China, 2006–2014. International Journal of Environmental Research and Public Health, 14(6), p.559.
  • Zhang, X. et al., 2013. Comparative Study of Four Time Series Methods in Forecasting Typhoid Fever Incidence in China. PloS One, 8(5), p.e63116.
Chandrika Kapagunta

Chandrika Kapagunta

Research Analyst at Project Guru
Chandrika is a nature enthusiast with special love for the marine world. Her Master’s degree in Marine Biotechnology and Scuba Diving experience has made her a strong advocate of environment and marine conservation, especially through bioremediation. She believes in finding solutions of everyday human problems in nature, be it medicines, technology or philosophy. Having worked as a volunteer at The Bombay Natural History Society and as a Senior Research Fellow at Central Institute of Fisheries Education, she has had exposure to the current state of the academic research, specifically in the field of environmental biotechnology.
Chandrika Kapagunta

Related articles

Discuss

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.