Document Type : Research article
Authors
1 Master's Student in Environmental Health Engineering, Member of the Student Research Committee, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
2 Associate Professor, Department of Environmental Health, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
3 Postdoctoral Researcher, School of Health, Member of the Student Research Committee, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
Abstract
Background and Purpose: This study aims to forecast PM2.5 concentrations using four non-linear Machine Learning (ML) models.
Materials and Methods: The ML techniques employed include Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting Regressor (XGBR), Random Forest (RF), and Gradient Boosting Regressor (GBR). Meteorological and pollutant data were collected to predict the Air Quality Index (AQI) in Mashhad, Khorasan Razavi Province, Iran, for the period from 2016 to 2022.
Results: The ML models performed exceptionally well in predicting PM2.5 concentrations, with approximately 95% of their predictions falling within a factor of the observed values. Additionally, the predicted PM2.5 concentrations were compared with observed values to assess prediction accuracy. Among the four ML models, GBR demonstrated the best performance, achieving high accuracy metrics, including a coefficient of determination (R²) of 0.9802, a mean absolute error (MAE) of 0.54, a mean squared error (MSE) of 5.33, a root mean squared error (RMSE) of 2.31, and a mean absolute percentage error (MAPE) of 1.9%.
Conclusion: This study proposes a high-accuracy PM2.5 prediction method using ML, which can be beneficial for global air quality monitoring and improving acute exposure assessments in epidemiological research.
Open Access Policy: This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/
Keywords
- Suleiman A, Tight MR, Quinn AD. Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (PM10 and PM2.5). Atmospheric Pollution Research. 2019;10(1):134-44. https://doi.org/10.1016/j.apr.2018.07.001
- He B, Xu H-M, Liu H-W, Zhang Y-F. Unique regulatory roles of ncRNAs changed by PM2.5 in human diseases. Ecotoxicology and Environmental Safety. 2023;255:114812. https://doi.org/10.1016/j.ecoenv.2023.114812 PMid:36963186
- McCarron A, Semple S, Braban CF, Swanson V, Gillespie C, Price HD. Public engagement with air quality data: using health behaviour change theory to support exposure-minimising behaviours. Journal of Exposure Science & Environmental Epidemiology. 2023;33(3):321-31. https://doi.org/10.1038/s41370-022-00449-2 PMid:35764891 PMCid:PMC10234807
- Yang H, Wang W, Li G. Prediction method of PM2.5 concentration based on decomposition and integration. Measurement. 2023;216:112954. https://doi.org/10.1016/j.measurement.2023.112954
- Hardini M, Sunarjo RA, Asfi M, Riza Chakim MH, Ayu Sanjaya YP. Predicting Air Quality Index using Ensemble Machine Learning. ADI Journal on Recent Innovation. 2023;5(1Sp):78-86. https://doi.org/10.34306/ajri.v5i1Sp.981
- Wu C-l, He H-d, Song R-f, Zhu X-h, Peng Z-r, Fu Q-y, et al. A hybrid deep learning model for regional O3 and NO2 concentrations prediction based on spatiotemporal dependencies in air quality monitoring network. Environmental Pollution. 2023;320:121075. https://doi.org/10.1016/j.envpol.2023.121075 PMid:36641063
- Doan QC, Chen C, He S, Zhang X. How urban air quality affects land values: Exploring non-linear and threshold mechanism using explainable artificial intelligence. Journal of Cleaner Production. 2024;434:140340. https://doi.org/10.1016/j.jclepro.2023.140340
- Sun J, Gong J, Zhou J. Estimating hourly PM2.5 concentrations in Beijing with satellite aerosol optical depth and a random forest approach. Science of The Total Environment. 2021;762:144502. https://doi.org/10.1016/j.scitotenv.2020.144502 PMid:33360341
- Yang L, Xu H, Yu S. Estimating PM2.5 concentrations in Yangtze River Delta region of China using random forest model and the Top-of-Atmosphere reflectance. Journal of Environmental Management. 2020;272:111061. https://doi.org/10.1016/j.jenvman.2020.111061 PMid:32669259
- Kim B-Y, Lim Y-K, Cha JW. Short-term prediction of particulate matter (PM10 and PM2.5) in Seoul, South Korea using tree-based machine learning algorithms. Atmospheric Pollution Research. 2022;13(10):101547. https://doi.org/10.1016/j.apr.2022.101547
- Gardner MW, Dorling SR. Statistical surface ozone models: an improved methodology to account for non-linear behaviour. Atmospheric Environment. 2000;34(1):21-34. https://doi.org/10.1016/S1352-2310(99)00359-3
- Berrocal VJ, Guan Y, Muyskens A, Wang H, Reich BJ, Mulholland JA, et al. A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration. Atmospheric Environment. 2020;222:117130. https://doi.org/10.1016/j.atmosenv.2019.117130 PMid:32863727 PMCid:PMC7451200
- Ghahremanloo M, Choi Y, Sayeed A, Salman AK, Pan S, Amani M. Estimating daily high-resolution PM2.5 concentrations over Texas: Machine Learning approach. Atmospheric Environment. 2021;247:118209. https://doi.org/10.1016/j.atmosenv.2021.118209
- Tang D, Liu D, Tang Y, Seyler BC, Deng X, Zhan Y. Comparison of GOCI and Himawari-8 aerosol optical depth for deriving full-coverage hourly PM2.5 across the Yangtze River Delta. Atmospheric Environment. 2019;217:116973. https://doi.org/10.1016/j.atmosenv.2019.116973
- Wang W, Mao F, Du L, Pan Z, Gong W, Fang S. Deriving Hourly PM2.5 Concentrations from Himawari-8 AODs over Beijing-Tianjin-Hebei in China. Remote Sensing [Internet]. 2017; 9(8). https://doi.org/10.3390/rs9080858
- Williams DR, Rast P. Back to the basics: Rethinking partial correlation network methodology. British Journal of Mathematical and Statistical Psychology. 2020;73(2):187-212. https://doi.org/10.1111/bmsp.12173 PMid:31206621 PMCid:PMC8572131
- Demir E, Bilgin MH, Karabulut G, Doker AC. The relationship between cryptocurrencies and COVID-19 pandemic. Eurasian Economic Review. 2020;10(3):349-60. https://doi.org/10.1007/s40822-020-00154-1 PMCid:PMC7388435
- Chen H, Li X, Feng Z, Wang L, Qin Y, Skibniewski MJ, et al. Shield attitude prediction based on Bayesian-LGBM machine learning. Information Sciences. 2023;632:105-29. https://doi.org/10.1016/j.ins.2023.03.004
- Asselman A, Khaldi M, Aammou S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments. 2023;31(6):3360-79. https://doi.org/10.1080/10494820.2021.1928235
- Tran DA, Tsujimura M, Ha NT, Nguyen VT, Binh DV, Dang TD, et al. Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam. Ecological Indicators. 2021;127:107790. https://doi.org/10.1016/j.ecolind.2021.107790
- T R P. A Comparative Study on Decision Tree and Random Forest Using R Tool. IJARCCE. 2015:196-9. https://doi.org/10.17148/IJARCCE.2015.4142
- Otchere DA, Ganat TOA, Ojero JO, Tackie-Otoo BN, Taki MY. Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Science and Engineering. 2022;208:109244. https://doi.org/10.1016/j.petrol.2021.109244
- Chicco D, Warrens M, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science. 2021;7:e623. https://doi.org/10.7717/peerj-cs.623 PMid:34307865 PMCid:PMC8279135
- Althoff D, Rodrigues LN. Goodness-of-fit criteria for hydrological models: Model calibration and performance assessment. Journal of Hydrology. 2021;600:126674. https://doi.org/10.1016/j.jhydrol.2021.126674
- Liu X, Zou B, Feng H, Liu N, Zhang H. Anthropogenic factors of PM2.5 distributions in China's major urban agglomerations: A spatial-temporal analysis. Journal of Cleaner Production. 2020;264:121709. https://doi.org/10.1016/j.jclepro.2020.121709
- Mohammadi M, Hatami M, Esmaeli R, Gohari S, Mohammadi M, khayami E. Relationships between Ambient Air Pollution, Meteorological Parameters and Respiratory Mortality in Mashhad, Iran: a Time Series Analysis. Pollution. 2022;8(4):1250-65.)Persian)
- Harrison R. Airborne particulate matter. Philosophical transactions Series A, Mathematical, physical, and engineering sciences. 2020;378:20190319. https://doi.org/10.1098/rsta.2019.0319 PMid:32981435 PMCid:PMC7536032
- Aminiyan MM, Kalantzi O-I, Etesami H, Khamoshi SE, Hajiali Begloo R, Aminiyan FM. Occurrence and source apportionment of polycyclic aromatic hydrocarbons (PAHs) in dust of an emerging industrial city in Iran: implications for human health. Environmental Science and Pollution Research. 2021;28(44):63359-76. https://doi.org/10.1007/s11356-021-14839-w PMid:34231139
- Maciejczyk P, Chen L-C, Thurston G. The Role of Fossil Fuel Combustion Metals in PM2.5 Air Pollution Health Associations. Atmosphere [Internet]. 2021; 12(9). https://doi.org/10.3390/atmos12091086
- Bilal M, Hassan M, Tahir DBT, Iqbal MS, Shahid I. Understanding the role of atmospheric circulations and dispersion of air pollution associated with extreme smog events over South Asian megacity. Environmental Monitoring and Assessment. 2022;194(2):82. https://doi.org/10.1007/s10661-021-09674-y PMid:35013892
- Pal S, Das P, Mandal I, Sarda R, Mahato S, Nguyen K-A, et al. Effects of lockdown due to COVID-19 outbreak on air quality and anthropogenic heat in an industrial belt of India. Journal of Cleaner Production. 2021;297:126674. https://doi.org/10.1016/j.jclepro.2021.126674 PMid:34975233 PMCid:PMC8714179
- Kanawade VP, Srivastava AK, Ram K, Asmi E, Vakkari V, Soni VK, et al. What caused severe air pollution episode of November 2016 in New Delhi? Atmospheric Environment. 2020;222:117125. https://doi.org/10.1016/j.atmosenv.2019.117125
- Hamzeh NH, Karami S, Kaskaoutis DG, Tegen I, Moradi M, Opp C. Atmospheric Dynamics and Numerical Simulations of Six Frontal Dust Storms in the Middle East Region. Atmosphere [Internet]. 2021; 12(1). https://doi.org/10.3390/atmos12010125