Modelo bayesiano espacio-temporal predictivo sobre personas que fallecerían debido al COVID-19 en cantones en Costa Rica durante el mes de mayo de 2021
Fecha
2022
Tipo
tesis de maestría
Autores
Zamora Mennigke, Ricardo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Resumen
La enfermedad respiratoria Covid-19, altamente contagiosa, ha causado tasas de mortalidad muy altas, así como un deterioro significativo en los pacientes que la contraen. El objetivo de este estudio ha sido el de proyectar, mediante modelos jerárquicos bayesianos espacio-temporales, la cantidad de personas que fallecerían por Covid-19 durante el mes de mayo de 2021, según los datos históricos y las covariables regionales con que se contaba (brindados oficialmente) sobre el Covid-19 en el periodo de abril 2020 a marzo 2021.
Aquí se analizan varios modelos bayesianos espacio-temporales con el fin de modelar variables de respuesta en epidemiología. Se establece que la variable de respuesta de fallecidos por la enferemdad es una variable con distribución binomial negativa. Además, se utiliza el paquete de R-INLA para realizar las estimaciones a posteriori de los modelos, debido a su facilidad y rapidez en comparación con los métodos de cadenas de Márkov.
En este estudio se analizan diferentes modelos según sus criterios de información (DIC, WAIC y CPO). Se sopesan las mejores combinaciones según las siguientes pautas: el tipo de interacción espacio-temporal, la distribución y las covariables más adecuadas para el modelo final.
La covariable más importante, de acuerdo con la literatura consultada, es la del ratio estándar de mortalidad (SMR por sus siglas en inglés) del mes anterior. Dentro de las covariables adicionales examinadas están: el índice de desarrollo humano, el porcentaje de adultos mayores y el porcentaje de vivienda urbana por cantones. Además, se incluyen covariables ambientales extraídas del espectrorradiómetro de imágenes de resolución moderada (MODIS). Al final la única covariable adicional del modelo final, además del SMR, es el porcentaje de vivienda urbana por cantones.
El modelo final que mejor ajusta los datos, incluye las covariables con una interacción de orden 2; es decir, estructurada en el tiempo, pero no en el espacio; y con un modelo aleatorio independiente (iid). El problema con este modelo se pone en evidencia al analizar los efectos, ya que en su SMR se presenta un efecto nulo, lo cual dificulta la estimación y predicción final. Los mapas de predicción, estimados por cantón, han sido empleados para generar predicciones futuras de la cantidad de personas que fallecerían
en abril y en mayo de 2021 por esta enfermedad.
Los resultados sugieren que las covariables seleccionadas, no permiten mejorar el ajuste del modelo. Es importante señalar que las limitaciones en el acceso a ciertos datos, inhiben una conclusión precisa. Esto por cuanto, no se puede concluir con exactitud si existe asociación entre la cantidad de fallecidos y ciertas condiciones socioeconómicas o ambientales. En tal sentido los análisis espacio-temporales tienden a verse afectados por el modelo elegido y la poca cantidad de periodos disponibles para el análisis.
Para futuros estudios, puede ser relevante el análisis de las diferencias entre las cadenas de Márkov e INLA en modelos espacio-temporales, ya que algunos modelos en este estudio, son adaptados de referencias que simularon modelos con cadenas de Márkov. Se requiere investigar, más a fondo, si el uso de INLA, puede resultar inapropiado cuando se requira un ajuste en los modelos espacio-temporales, con conteos bajos o limitaciones significativas en la cantidad de covariables y períodos comparado con las cadenas de Márkov.
The highly contagious respiratory disease Covid-19 has caused very high mortality rates, as well as significant deterioration in patients who contract it. The objective of this study has been to project, using hierarchical Bayesian space-time models, the number of people who would die from Covid-19 during the month of May 2021, according to historical data and the regional covariates available ( officially provided) on Covid-19 in the period from April 2020 to March 2021. Several Bayesian spatio-temporal models are analyzed in order to model response variables in epidemiology. It is established that the response variable of deaths from the disease is a variable with a negative binomial distribution. In addition, the R-INLA package is used to perform the posterior estimates of the models, due to its ease and speed compared to the Markov chain methods. In this study, different models are analyzed according to their information criteria (DIC, WAIC and CPO). The best combinations are weighed according to the following guidelines: the type of spatio-temporal interaction, the distribution, and the most suitable covariates for the final model. The most important covariate, according to the literature consulted, is the standard mortality ratio (SMR) of the previous month. Among the additional covariates examined are: the human development index, the percentage of older adults and the percentage of urban housing by cantons. In addition, environmental covariates extracted from the Moderate Resolution Imaging Spectroradiometer (MODIS) are included. In the end, the only additional covariate in the final model, besides the SMR, is the percentage of urban housing by canton. The final model that best fits the data includes the covariates with an interaction of order 2; that is, structured in time, but not in space; and with an independent random model (iid). The problem with this model becomes evident when analyzing the effects, since in its SMR there is a null effect, which makes final estimation and prediction difficult. The prediction maps, estimated by canton, have been used to generate future predictions of the number of people who would die in April and May 2021 from this disease. The results suggest that the selected covariates do not allow to improve the fit of the model. It is important to note that limitations in access to certain data inhibit an accurate conclusion. This is because it cannot be concluded exactly if there is an association between the number of deaths and certain socioeconomic or environmental conditions. In this sense, space-time analyzes tend to be affected by the chosen model and the small number of periods available for analysis. For future studies, the analysis of the differences between Markov chains and INLA in space-time models may be relevant, since some models in this study are adapted from references that simulated models with Markov chains. It is necessary to investigate further whether the use of INLA may be inappropriate when an adjustment is required in space-time models, with low counts or significant limitations in the number of covariates and periods compared to Markov chains.
The highly contagious respiratory disease Covid-19 has caused very high mortality rates, as well as significant deterioration in patients who contract it. The objective of this study has been to project, using hierarchical Bayesian space-time models, the number of people who would die from Covid-19 during the month of May 2021, according to historical data and the regional covariates available ( officially provided) on Covid-19 in the period from April 2020 to March 2021. Several Bayesian spatio-temporal models are analyzed in order to model response variables in epidemiology. It is established that the response variable of deaths from the disease is a variable with a negative binomial distribution. In addition, the R-INLA package is used to perform the posterior estimates of the models, due to its ease and speed compared to the Markov chain methods. In this study, different models are analyzed according to their information criteria (DIC, WAIC and CPO). The best combinations are weighed according to the following guidelines: the type of spatio-temporal interaction, the distribution, and the most suitable covariates for the final model. The most important covariate, according to the literature consulted, is the standard mortality ratio (SMR) of the previous month. Among the additional covariates examined are: the human development index, the percentage of older adults and the percentage of urban housing by cantons. In addition, environmental covariates extracted from the Moderate Resolution Imaging Spectroradiometer (MODIS) are included. In the end, the only additional covariate in the final model, besides the SMR, is the percentage of urban housing by canton. The final model that best fits the data includes the covariates with an interaction of order 2; that is, structured in time, but not in space; and with an independent random model (iid). The problem with this model becomes evident when analyzing the effects, since in its SMR there is a null effect, which makes final estimation and prediction difficult. The prediction maps, estimated by canton, have been used to generate future predictions of the number of people who would die in April and May 2021 from this disease. The results suggest that the selected covariates do not allow to improve the fit of the model. It is important to note that limitations in access to certain data inhibit an accurate conclusion. This is because it cannot be concluded exactly if there is an association between the number of deaths and certain socioeconomic or environmental conditions. In this sense, space-time analyzes tend to be affected by the chosen model and the small number of periods available for analysis. For future studies, the analysis of the differences between Markov chains and INLA in space-time models may be relevant, since some models in this study are adapted from references that simulated models with Markov chains. It is necessary to investigate further whether the use of INLA may be inappropriate when an adjustment is required in space-time models, with low counts or significant limitations in the number of covariates and periods compared to Markov chains.
Descripción
Palabras clave
Integrated Nested Laplace Approximation, INLA, Modelos espacio-temporales, COVID-19, Mapeo de enfermedades