2,405
Views
1
CrossRef citations to date
0
Altmetric
Civil & Environmental Engineering

Air quality analysis and PM2.5 modelling using machine learning techniques: A study of Hyderabad city in India

, , , , ORCID Icon, & show all
Article: 2243743 | Received 08 Feb 2023, Accepted 29 Jul 2023, Published online: 13 Aug 2023

Abstract

The rapid urbanization and industrialization in many parts of the world have made air pollution a global public health problem. A study conducted by the Swiss organization IQAir indicated that 22 of the top 30 most polluted cities in the world are in India. This creates the problem of air pollution, which is very relevant to India as well. Exposure to air pollutants has both acute (short-term) and chronic (long-term) impacts on health. Among the major air pollutants, particulate matter 2.5 (PM2.5) is the most harmful, and its long-term exposure can impair lung functions. Pollutant concentrations vary temporally and are dependent on the local meteorology and emissions at a given geographic location. PM2.5 forecasting models have the potential to develop strategies for evaluating and alerting the public regarding expected hazardous levels of air pollution. Accurate measurement and forecasting of pollutant concentrations are critical for assessing air quality and making informed strategic decisions. Recently, data-driven machine learning algorithms for PM2.5 forecasting have received a lot of attention. In this work, a spatio-temporal analysis of air quality was first performed for Hyderabad, indicating that average PM2.5 concentrations during the winter were 68% higher than those during the summer. Following that, PM2.5 modelling was done using three different techniques: multilinear regression, K-nearest neighbours (KNN), and histogram-based gradient boost (HGBoost). Among these, the HGBoost regression model, which used both pollution and meteorological data as inputs, outperformed the other two techniques. During testing, the model acquired an amazing R2 value of 0.859, suggesting a significant connection with the actual data. Additionally, the model exhibited a minimum Mean Absolute Error (MAE) of 5.717 μg/m3 and a Root Mean Square Error (RMSE) of 7.647 μg/m3, further confirming its accuracy in predicting PM2.5 concentrations. In our investigation, we discovered that the HGBoost3 model beat other PM2.5 modelling models by having the lowest error and the highest R2 value. This study made a substantial addition by incorporating the spatiotemporal relationship between air pollutants and meteorological variables in predicting air quality. This method has the potential to improve the creation of more precise air pollution forecast models.

1. Introduction

Air, being a vital element for sustaining life on Earth, is facing challenges due to both human-made and natural factors. Industrialization, volcanic eruptions, forest fires, agricultural burning, and urbanization, among others, have collectively contributed to a decline in air quality worldwide (Raju et al., Citation2022). Shockingly, approximately eighty percent of global cities and ninety-eight percent of cities in middle-income nations surpass recommended air quality standards. This escalation in air pollution leads to detrimental consequences, accelerated climate change resulting in extreme weather events, including economic losses, reduced visibility, and millions of premature deaths annually (Jung et al., Citation2015). The main air pollutants include particulate matter, carbon dioxide, carbon monoxide, nitrogen oxides, sulfur oxides, and volatile organic compounds. Among these, anthropogenic fine particulate matter (PM2.5) poses a significant threat to air quality, representing particles with an aerodynamic diameter less than 2.5 micrometres (Gregorio et al., Citation2022). Accurate forecasting of air pollutants and identifying pollution trends can aid scientists in devising effective emission control strategies. India has been grappling with air pollution for over a century, and the situation has worsened in recent decades due to rapid population growth, unplanned urbanization, and industrialization (Kapoor, Citation2017). Notably, 22 of the world’s 30 most polluted cities are in India (IQAir, Citation2020), and from 2008 to 2013, India ranked among the most polluted countries globally, according to the World Health Organization’s database (World Health Organization [WHO], Citation2014). Ambient air pollution in India is a significant contributor to approximately 17.8% of all fatalities, as reported by the Global Burden of Illness study (Lancet, Citation2020). Among the major factors responsible for these deaths are ambient particulate matter (PM) and residential air pollution.

Over the past two decades, India has witnessed some of the most severe and widespread air quality degradation, making air pollution a critical concern for regulatory authorities. The post-monsoon season, particularly in the Indo Gangetic Plains (IGP), becomes highly susceptible to severe pollution episodes. An alarming example occurred during a week in early November 2017, when PM2.5 particle levels in Delhi surpassed WHO regulations by 25 times (11 times higher than Indian standards), triggering an environmental health emergency (De Vito et al., Citation2018). Recognizing the gravity of the situation, the WHO updated its air quality guidelines, establishing revised annual and 24-hour requirements for six major pollutants (WHO, Citation2021). For instance, the yearly average standards for PM10 and PM2.5 were reduced from 20 g/m3 to 15 g/m3 and from 10 g/m3 to 5 g/m3, respectively. Despite technological advancements aimed at improving air quality, significant transformative changes are yet to be fully realized (Gulia et al., Citation2018). The issue of air pollution continues to demand urgent attention and comprehensive efforts to safeguard public health and the environment.

The introduction of real-time monitoring stations in India heralded a significant technological advancement in the field of air quality monitoring. In 2006, these stations were initially set up in Delhi as a pilot project and were later expanded to numerous other cities after 2016 (Roychowdhury & Somvanshi, Citation2020). Over time, the number of Continuous Ambient Air Quality Monitoring (CAAQM) stations across the country has increased to 278, serving 147 cities. Additionally, India established the System of Air Quality and Weather Forecasting and Research (SAFAR) network, which combines manual and real-time monitoring with air quality forecasting capabilities. Initially implemented by the Indian Institute of Tropical Meteorology (IITM) in Pune for Delhi, SAFAR has now expanded its forecasting services to three other cities: Pune, Mumbai, and Ahmedabad (http://safar.tropmet.res.in/). These forecasting models have proven valuable in guiding policymakers to make informed strategic decisions. While a significant portion of the country’s air quality monitoring spectrum is covered by manual and real-time monitoring stations, researchers are actively exploring the use of low-cost sensors to measure air quality on a smaller spatiotemporal scale (A. Kumar & Gurjar, Citation2019). This approach holds promise for enhancing the granularity and coverage of air quality data, allowing for more detailed insights into local air pollution patterns.

The rising number of premature deaths associated with air pollution has drawn significant attention to the health impacts of PM2.5 (Particulate Matter with a diameter of 2.5 micrometers or less) (Cohen et al., Citation2017; Silva et al., Citation2013). These tiny particles possess the ability to deeply penetrate the respiratory tract and reach the lungs, giving rise to respiratory and cardiovascular ailments (Gronlund et al., Citation2015). Exposure to PM2.5 can cause irritation in the eyes, nose, throat, and lungs, leading to symptoms such as coughing, sneezing, runny nose, and shortness of breath. Furthermore, PM2.5 exposure can severely impact respiratory function and worsen medical conditions such as asthma and heart disease. Studies have revealed that increased daily exposure to PM2.5 is associated with a higher number of respiratory and cardiovascular hospital admissions, emergency department visits, and mortality rates (Baccarelli et al., Citation2014). Long-term exposure to fine particulate matter has also been linked to an elevated risk of chronic bronchitis, decreased lung function, and mortality from lung cancer and heart disease. Particularly vulnerable to the harmful effects of PM2.5 are individuals with pre-existing lung and cardiac issues, as well as children and the elderly.

Numerous studies have investigated the spatial and temporal variations of pollutants (Analitis et al., Citation2020; Athanasiadis et al., Citation2003; Chaloulakou et al., Citation2003). Varaprasad et al. (Citation2021) conducted a study focusing on PM2.5, Carbon Monoxide (CO), NOx, and sulphur dioxide (SO2), and observed distinct seasonal fluctuations for each pollutant. In the study area, PM2.5 and PM10 concentrations had a notable impact on air quality, with PM2.5 mass concentrations being higher during the post-monsoon season (Das et al., Citation2023; Li et al., Citation2019). The study also found that PM2.5 concentrations varied significantly during the day. Additionally, regional disparities were identified during the investigation. Furthermore, Li et al. (Citation2019) explored the correlation between meteorological factors and PM2.5 concentration, noting that precipitation, relative humidity, air temperature, and wind speed showed a negative relationship with PM2.5 concentration. These findings shed light on the complex interactions between air pollutants and meteorological conditions, contributing to a better understanding of air quality variations.

In light of the study conducted by Singh et al. (Citation2013), which established a significant correlation between daily death rates and air pollution statistics, the researchers found that PM2.5 (particulate matter with a diameter of 2.5 micrometers or smaller) was particularly dangerous. This was attributed to its ability to penetrate the lung walls, causing severe health issues and potentially leading to increased mortality rates in regions with higher levels of this pollutant. To address the complexities of predicting future pollution levels, S. Kumar et al. (Citation2020) took a different approach. They employed general statistical methodologies, including multiple linear regression, to develop models capable of forecasting pollution levels over time. These models were then applied in another study, where researchers aimed to identify correlations between various characteristics and pollution patterns. The primary finding of this subsequent study was the successful identification of correlations between certain characteristics and air pollution patterns. However, it became evident that predicting various elements of time series data, such as trends, seasonality, and outliers, presented considerable challenges. The studies relied predominantly on simple statistical models, which proved inadequate in capturing the intricacies and nuances of pollution trends. The main limitations that hindered the accurate forecast of air pollution patterns were twofold. First, these simple statistical models lacked the ability to handle complex interdependencies and correlations between different variables that influence pollution levels. Air pollution is influenced by a multitude of factors, including industrial activities, traffic, weather conditions, and geographical features, making it a complex and multifaceted problem to model accurately. Second, the models faced difficulties in capturing seasonal variations and long-term trends. Air pollution patterns often exhibit seasonal fluctuations due to variations in weather conditions, human activities, and natural phenomena. These seasonal variations are crucial to understanding pollution levels and predicting future trends accurately. Furthermore, capturing long-term trends and potential outliers in pollution data is essential for making informed policy decisions and implementing effective mitigation strategies.

The importance of accurately modeling and anticipating air quality cannot be overstated, as it enables the public to be aware of potential health risks and empowers them to take precautionary measures. In recent years, machine learning approaches have gained popularity for forecasting temporal sequences of pollutants, and their application to air quality forecasting has been on the rise (Le et al., Citation2020). Forecasting models play a crucial role in developing effective strategies for assessing and informing the public about potential spikes in the air quality index (Zhang et al., Citation2021). These models generally fall into two main categories: simulation-based and data-driven approaches, each utilizing different methods to predict air pollution concentrations. The simulation-based approach integrates physical and chemical models to simulate the emission, transport, and chemical transformation of air pollutants. This method takes into account various factors such as emissions from different sources, meteorological conditions, and background characteristics to generate forecasts (Grell et al., Citation2005). While this approach can provide valuable insights, it does face certain challenges. One of the primary challenges is the presence of uncertainties in numerical models, which can impact the accuracy of predictions. Additionally, a lack of sufficient data on certain parameters can limit the precision of simulation-based models. On the other hand, data-driven approaches leverage statistical and machine learning techniques to identify patterns (Li et al., Citation2016). This approach proves to be effective, especially when dealing with high-dimensional data, as machine learning algorithms can efficiently discover relevant exposures that are related to desired health outcomes. Data-driven approaches are particularly useful when dealing with complex air pollution patterns influenced by numerous factors, as they can adapt and learn from the available data to make predictions (Ma et al., Citation2022).

Data-driven machine learning technologies have revolutionized the way researchers investigate the influence of various air contaminants on health outcomes (Caselli et al., Citation2009; Goudarzi et al., Citation2021a; Liu et al., Citation2021; Tsai et al., Citation2018; Ni et al., Citation2017; Niska et al., Citation2004; Siew et al., Citation2008). These advanced methodologies enable them to analyze and interpret complex data sets, considering multiple air pollutants simultaneously and their potential impact on human health. One critical area of research has been focused on early-life exposure to ambient air pollution and its effects on children’s neurodevelopment. Studies, such as the one conducted by E. Kim et al. (Citation2014), have provided mounting evidence that exposure to air pollution during early developmental stages may have adverse effects on cognitive development and neurobehavioral outcomes in children. This research highlights the importance of understanding the long-term consequences of air pollution exposure during critical periods of brain development and emphasizes the need for implementing measures to protect vulnerable populations, particularly children, from harmful air pollutants. Additionally, the implications of air pollution extend beyond neurological effects. A noteworthy study by Huang et al. (Citation2020) identified air pollution as a risk factor for obesity, particularly among individuals with a higher body mass index (BMI). This finding suggests that air pollution may have a broader impact on metabolic health, raising concerns about its contribution to the obesity epidemic. Further exploration into the relationship between air pollution and various health conditions, including obesity, is essential for formulating effective public health policies and interventions.

Air pollution’s far-reaching consequences extend beyond the realm of public health and also encompass detrimental effects on several industries, most notably agriculture. In China, extensive studies have revealed the significant impact of industrial air pollution on agricultural productivity. As a result, the agricultural sector experiences reduced marginal products, while various critical parameters like labor-capital dynamics undergo alterations (Wang et al., Citation2020). This intersection between air pollution and agricultural productivity inevitably ripples into broader economic implications for a country. The negative influence on agricultural output can hamper food production and supply, potentially leading to food shortages and increased prices. Moreover, decreased agricultural productivity may lead to reduced exports and hinder the overall growth of the economy. Additionally, air pollution’s effect on agriculture can disrupt rural livelihoods, forcing communities to cope with environmental challenges that affect their socio-economic well-being. For instance, farmers may face financial burdens due to decreased crop yields, exacerbating poverty and inequality. The economic impact of air pollution is not limited to agriculture alone; it also extends to other sectors. For instance, manufacturing and industrial activities might suffer from decreased productivity and increased costs due to air quality regulations and health-related absences among workers. Moreover, the healthcare sector experiences a surge in the demand for medical services, placing a strain on the healthcare system and draining resources that could be allocated elsewhere for development. In sum, the detrimental effects of air pollution on agriculture and various industries contribute to a vicious cycle that hinders a country’s overall economic development. Addressing air pollution becomes a crucial priority for sustainable economic growth, improved public health, and the well-being of communities and industries alike.

Due to exponential growth in both urbanization and industrialization, India has become highly vulnerable to atmospheric pollution in recent years, particularly in urban areas. The increasing level of pollutants in the atmosphere worsened the ambient air quality. The Air Quality Index has been increasing at an alarming rate in major Indian cities. This made us focus on the air quality of major Indian cities. Therefore, necessary steps need to be taken to overcome this critical issue. Pollutant forecasting and the discovery of various patterns in air pollution will improve the scientific knowledge required for the development of an optimal emission control strategy. To bring long term solutions for the problem of air pollution, the right strategic decisions must be taken, and this is possible only if there is an accurate air quality measurement and forecasting system in place. Machine learning-based PM2.5 forecasting models offer the potential to develop methods for evaluating and warning the public about potentially harmful levels of air pollution (X. Feng et al., Citation2015; Goudarzi et al., Citation2021b).

The current study has multiple goals: (a) conduct spatiotemporal air quality analysis and explore seasonal changes in PM2.5 levels. (b) Conduct a correlation analysis. (c) Examine the relationship between input and output variables. Furthermore, the project intends to utilise several machine learning techniques for PM2.5 modelling and to establish the best PM2.5 forecasting model.

2. Study area

The city of Hyderabad will be the focus of the spatio-temporal analysis and PM2.5 modeling in this study. As a fast-growing global city, Hyderabad’s air quality has deteriorated significantly over the last decade, owing mostly to increased traffic and the presence of numerous industries in its northern and eastern sectors. Despite being an important urban center, the literature review revealed a scarcity of studies on the air quality of Hyderabad. Furthermore, the observed PM2.5 levels in the city have consistently exceeded the prescribed limits set by the Central Pollution Control Board (CPCB). Considering these factors, Hyderabad was chosen as the ideal study area for this project.

2.1. Hyderabad

The city experiences a hot semi-arid climate, characterized by distinct weather patterns. During the months of June to October, Hyderabad receives substantial rainfall due to the influence of the southwest monsoon, which contributes significantly to its overall precipitation. The mean annual temperature hovers around 26.6 °C, with monthly average temperatures ranging from 21–33 °C. May stands out as the hottest month, with temperatures soaring as high as 36–39°C, while the coolest period occurs from December to January, with temperatures ranging from 14.5–28 °C. Hyderabad boasts a diverse economy, fueled by key industries such as information, pharmaceuticals, drugs and technology, manufacturing, food, and hospitality. The city’s prominence as a major IT hub has earned it the moniker “Cyberabad,” with numerous IT parks and multinational corporations operating within its boundaries. The pharmaceutical and biotechnology sectors have also flourished in Hyderabad, housing headquarters and manufacturing units of several prominent companies. Additionally, the manufacturing, hospitality, and food industries contribute significantly to the city’s economic growth.

However, alongside the positive aspects of its economy, Hyderabad faces the challenge of vehicular emissions as a significant contributor to pollution. The increasing number of vehicles, coupled with traffic congestion, contributes to air pollution in the city. Efforts are being made to address this issue, including the promotion of public transportation, the encouragement of electric vehicles, and the implementation of stricter emission norms for industries. These measures aim to improve air quality and reduce pollution levels, ensuring a sustainable and healthier environment for the residents of Hyderabad. Figure represents the geographical location of Hyderabad district and six monitoring stations.

Figure 1. (a) Map of Hyderabad (b) the locations of Hyderabad’s six Monitoring Stations (MS).

Figure 1. (a) Map of Hyderabad (b) the locations of Hyderabad’s six Monitoring Stations (MS).

3. Data and methodology

Data collection and data pre-processing will be the initial step. Spatio-temporal air quality analysis of the study area needs to be considered before modeling to understand the trend of air pollution in the study area. PM2.5 modelling will be conducted out using several machine learning models, and performance will be evaluated. Figure depicts the general methods used for PM2.5 modelling.

Figure 2. General methodology for PM2.5 modelling.

Figure 2. General methodology for PM2.5 modelling.

The research begins with meticulous data collection. Subsequently, the collected data undergoes a crucial data pre-processing step to prepare it for modeling. This pre-processing process comprises four stages, which include data integration, data cleaning, organization, checking for missing data and outliers, and finally, preparing distinct training, testing, and validation datasets. After completing the data pre-processing step, a correlation analysis is conducted to reveal the relationships between the input and output variables. This analysis helps identify the key factors influencing PM2.5 levels in the study area. Next, a comprehensive spatio-temporal air quality analysis is undertaken to explore the variation of pollutant levels during the study period. The spatio-temporal analysis of air quality involves studying and evaluating air pollution levels across geographical locations and time. This analysis is crucial for comprehending the distribution, trends, and patterns of air pollutants in different areas and their fluctuations over time periods. By identifying pollution hotspots, seasonal variations, and long-term trends, this research becomes vital in formulating effective air quality management strategies and policies. In this study, we utilized geographic information systems (GIS), statistical methods, and machine learning techniques to process and analyze the extensive datasets involved. This multidisciplinary approach allows us to gain a comprehensive understanding of air quality dynamics and enables the development of efficient measures to enhance overall air quality. This analysis offers valuable insights into spatial patterns and seasonal trends of air pollution, enhancing our understanding of the overall air quality dynamics. For PM2.5 modeling, three powerful machine learning algorithms are utilized: Multi linear regression, KNN regression, and HGBoost regression. These algorithms are chosen for their capability to capture complex relationships between variables and provide accurate predictions. To assess the performance of each model, various evaluation metrics are employed, such as MAE, RMSE, and R2. These metrics help gauge how well the models are able to predict PM2.5 levels. Finally, a rigorous model comparison is conducted to determine the best PM2.5 forecasting model for the specific study area. The selected model will serve as a crucial tool for assessing and managing air quality in the city of Hyderabad, assisting authorities in implementing effective strategies and measures to safeguard public health from air pollution.

3.1. Data collection

Air pollutant and meteorological data were gathered from the all India CAAQMS portal, which is administered and operated by the Central Pollution Control Board of India (CPCB). The data collection was conducted at six Continuous Ambient Air Quality Monitoring Stations (CAAQMS) situated in Hyderabad city. The 24-hr meteorological and pollutant data were obtained for a period spanning from January 2018 to December 2019 from six CAAQMS managed by the Central Pollution Control Board (CPCB) of India in Hyderabad (CPCB).

3.1.1. Data pre-processing

Prior to initiating any modeling process, data pre-processing is a critical step to prepare the input data for machine learning models. This step involves removing inconsistent data, handling null or missing values, and addressing outliers that may disrupt the modeling process. Null or missing values in the data are identified and removed to ensure the quality and accuracy of the dataset. Outliers, which can arise from faulty readings or exceptional events, such as forest fires or religious gatherings, are also addressed, as they can significantly impact pollutant levels. The MLR, KNN, and the HGBoost regression models are used in this work. A variety of metrics are used to evaluate and compare their performance. R2, MAE, RMSE, and MSE are employed as evaluation metrics to assess the models’ accuracy and effectiveness. Through comprehensive evaluation and comparison of these models using appropriate metrics, the study aims to identify the most suitable PM2.5 forecasting model for the given study area. This will aid in making informed decisions to manage and mitigate air pollution effectively.

4. Results and discussion

4.1. General

During the span of two years, from January 2018 to December 2019, a comprehensive spatio and temporal air quality analysis was performed. This analysis aimed to understand the variations in air pollutant levels across different locations and time periods. Additionally, a correlation analysis was conducted using the collected data to unveil the relationships between various variables. For PM2.5 modeling, three different regression techniques were employed: multilinear regression, K-nearest neighbor regression, and HGBoost regression. These models were utilized to predict PM2.5 levels based on the available data. Following the modeling process, the results obtained from the three models were meticulously compared.

4.2. Spatio-temporal air quality analysis

Figure depicts a box plot depicting the change of various pollutants for each of the Monitoring Stations (MS) over the time period from January 2018 to December 2019. PM2.5, NO2, SO2, CO, and ozone were analyzed for all 6 stations. The box plot is plotted using the daily average data for all the pollutants. The box plot for CO was plotted separately due to the difference in units. Table provides the air quality standards set by the CPCB for each of the pollutants to achieve satisfactory conditions.

Figure 3. Variation of pollutants for different stations over the period from January 2018 to December 2019.

Figure 3. Variation of pollutants for different stations over the period from January 2018 to December 2019.

Table 1. Pollutant CPCB criteria for acceptable conditions

The air quality standard for PM2.5 set by the CPCB for satisfactory conditions is 60 μg/m3. The average daily concentration of PM2.5 was found to be exceeding the CPCB limits in all the stations. Most of this higher concentration of PM2.5 has been observed during the winter season. During the two-year study period, 246 days had an average PM2.5 concentration that exceeded the CPCB threshold of 60 μg/m3. On 6 November 2019, the highest PM2.5 concentration for MS6 was 130.54 μg/m3. This was the highest PM2.5 concentration obtained during the period of study. The median value of 52.04 μg/m3 obtained for MS5 indicates that the PM2.5 pollution level in MS5 is relatively on the higher side when compared with other stations (Table ).

Table 2. Variation in PM2.5 levels among stations

Upon conducting a more detailed analysis of Figure , it becomes evident that Monitoring Station 1 (MS1) and MS5 exhibit significantly higher NO2 concentrations compared to the other monitoring sites. At MS1, the mean average daily concentration of NO2 is recorded as 48.62 g/m3, with the maximum value observed during the monitoring period reaching 103.42 μg/m3. This maximum concentration surpasses the CPCB limit of 80 μg/m3 by a substantial 29%. Meanwhile, at MS5, the daily average concentration of NO2 is 48.11 μg/m3, with a striking peak of 130.59 μg/m3 recorded on 9 February 2019. This maximum concentration represents a noteworthy 62% increase over the CPCB limit for NO2. It’s worth noting that this particular value stands as the highest NO2 concentration reported during the entire monitoring duration.

Figure is a box plot depicting the fluctuation of CO levels across stations. CO concentrations were found to be greater at MS5 station, as was the case with NO2. The average CO content observed at MS5 was 0.75 mg/m3. Although this figure is lower than the CPCB-regulated standard of 2 mg/m3, it is important to note that the CO concentration is highly variable, as seen by a standard deviation of 0.37 mg/m3. Figure displays the locations of air quality monitoring stations (MS) 1–6. Notably, MS5 (Sanathnagar MS) is located at a heavily used crossroads, which explains for the increased CO and NO2 values at this station. These pollutants are mostly released by traffic-related fuel burning, which explains the higher amounts detected at MS5. The concentrations of SO2 and ozone at all sites remain within the CPCB’s acceptable limits of 80 μg/m3 and 100 μg/m3, respectively. Figure depicts the temporal variation of PM2.5 over a 2-year period using a box plot for the summer and winter seasons. This illustration depicts the seasonal changes in PM2.5 levels over the chosen time period.

Figure 4. CO variations at various stations.

Figure 4. CO variations at various stations.

Figure 5. Seasonal variation of PM2.5.

Figure 5. Seasonal variation of PM2.5.

The seasonal pattern of PM2.5 levels, as depicted in Figure , clearly illustrates a significant rise in pollutant concentration during the winter season. This sharp increase can be attributed to the inversion effect in the atmosphere, which is prevalent during the winter months. Across all monitoring stations, an average increase of 68% in PM2.5 concentration has been observed during the winter season. The maximum increase of 82% was recorded at MS2, while the minimum change of 52% was observed at MS3. MS6 had the highest average winter PM2.5 concentration of 69.12 μg/m3, while MS4 had the lowest average concentration of 30.69 μg/m3 during the summer season. Table provides a detailed summary of the variation in average PM2.5 concentrations across all sites during the winter and summer seasons.

Table 3. Seasonal variation in P 2.5 across different stations

4.3. Correlation analysis

A correlation analysis was done on the acquired data to better understand the link between the input as well as output characteristics. This investigation was carried out independently for meteorological factors and contaminants. Table contains specific information about the climatic parameters used in the study.

Table 4. Information on the meteorological input variables

A correlation analysis was conducted to examine the relationship between meteorological parameters and PM2.5 levels. The outcomes of this analysis are presented in the form of a correlation heatmap, as depicted in Figure .

Figure 6. Correlation heatmap of meteorological parameters with PM2.5.

Figure 6. Correlation heatmap of meteorological parameters with PM2.5.

From the results, it has been inferred that all the meteorological parameters used as input have a negative correlation with PM2.5. With a Pearson correlation coefficient (r) of −0.62, wind direction (WR) has the largest negative correlation. Additionally, wind speed (WS) displays a negative correlation with an r value of −0.24. The negative correlation of PM2.5 concentration with wind speed and direction is based on the fact that wind is capable of transporting the light PM2.5 particles in the air. A higher wind speed or a change in the wind direction away from the monitoring station would transport the PM2.5 particles away, thereby reducing their concentration (Raju et al., Citation2022).

The r value for temperature (Temp) was obtained as −0.22. The negative correlation of temperature with PM2.5 concentration is due to the strong air convection during higher temperatures. As the atmospheric temperature increases, the land heats up more quickly than the air. This creates a disparity in temperature between the air near the land surface and the air above it. The warmer air near the surface becomes less dense and starts to rise through convection. The lighter PM2.5 particles at the surface are transported upward with the ascending air during this intense convective upward movement. As a result, their concentration is reduced near the ground, contributing to a decrease in the overall level of PM2.5 pollutants in the lower atmosphere.

The r values for RH and SR are −0.02 and −0.075, respectively, showing that both of these parameters have a modest connection with PM2.5.

Table shows the details about the pollutant parameters being used in the study.

Table 5. Details about the pollutant input variables

To investigate the relationship between pollutant parameters and PM2.5 levels, a correlation analysis was done, and the results were visualised in the form of a correlation heatmap (Figure ). All pollutant indicators were shown to have positive relationships with PM2.5. With a r value of 0.54, NO2 had the strongest positive connection with PM2.5 of any pollutant measure. This suggests that NO2 is a key precursor of PM2.5. With a r value of 0.32, ozone exhibited the second largest positive connection with PM2.5. SO2 has a mildly positive connection with PM2.5, as demonstrated by its r value of 0.22. CO, on the other hand, has the lowest r value, indicating that it has no effect on PM2.5 concentrations. The correlation analysis revealed important information about how different input parameters are related to the dependent variable, PM2.5. Despite the fact that some parameters had little effect on the objective variable, all parameters were utilised as inputs in the modelling process to assure full consideration without adding computing complexity.

Figure 7. Pollutant parameter correlation heatmap with PM2.5.

Figure 7. Pollutant parameter correlation heatmap with PM2.5.

4.4. PM2.5 modeling

The PM2.5 modeling has been done using MLP, KNN and HGBoost. The data was split (80% for training and 20% for testing) and fed into three different models for each algorithm, each with a different set of input variables. The first model used only the meteorological parameters as independent variables, while the second model used pollutant variables alone. As input for PM2.5 modelling, the third model used a combination of meteorological and pollutant data.

4.4.1. Multi Linear Regression (MLR)

Table displays the results of the multilinear regression model. The results show that the MLR1 model with only pollutant data as input had the least impressive results, with an R2 of 0.345 during testing. This model’s error was likewise higher than the other two models, with an RMSE of 18.06 μg/m3 and MAE of 14.552 μg/m3. During testing, the MLR2 model utilising meteorological characteristics as input performed better than the MLR1 model, with a lower error and an R2 value of 0.467.

Table 6. Results obtained for multi linear regression

However, the MLR3 model with both pollutant and meteorological data as input outperformed the other two models examined. It had the smallest error of the three models, with a MAE of 11.297 μg/m3 and an RMSE of 14.453 μg/m3. During training and testing, R2 values were 0.577 and 0.581, respectively. The R2 for testing in the MLR3 model increased by 68.4% and 24.4%, respectively, as compared to the MLR1 and MLR2 models.

The outcomes of the MLR3 model are depicted in Figure . To compare the anticipated and test values, a combined plot was created. The R2 score for testing was determined to be 0.581, indicating that the model was well-fitting. Furthermore, r between the predicted and test values was calculated to be 0.76, indicating a strong positive association. The distribution plot with a kernel density estimator (KDE) in Figure depicts the distribution of errors or residuals in the modelling results. The distribution plot’s base width is broader, indicating that the mistakes are more variable. The observed peak for zero error is quite near to 0.03, showing a minor bias in the model’s predictions (Figure ). The equation of the line of best fit for MLR3 is given by:

Figure 8. Distribution plot for MLR3 model.

Figure 8. Distribution plot for MLR3 model.

Figure 9. Joint plot for MLR3 model.

Figure 9. Joint plot for MLR3 model.

PM2.5=72.975+0.328 NO2+0.279 SO2+0.090 CO+0.281 Ozone0.147 RH0.144 Temp0.103 WS0.154 WD0.070 SR

4.4.2. K-nearest neighbour regression model (KNN model)

Table displays the results of the KNN regression model. The K value used for the modelling is (K) = 2. The results obtained indicate that the KNN1 model with pollutant data alone as the input gave the most underwhelming result with a R2 of 0.471 during testing. The error obtained for this model was also higher than the other two models, with an RMSE of 16.229 μg/m3 as well as a MAE of 11.717 μg/m3. During testing, the KNN2 model with meteorological parameters as input outperformed KNN1, with a lower error and an R2 value of 0.498.

Table 7. Results obtained for K- nearest neighbour regression

Among the three models evaluated, the KNN3 model, which utilized both pollutant and meteorological data as input, demonstrated superior performance. With a MAE (Mean Absolute Error) of 9.137 μg/m3 and an RMSE (Root Mean Square Error) of 12.594 μg/m3, it exhibited the lowest error compared to the other models. The KNN3 model achieved R2 values of 0.897 and 0.682 during the training and testing periods, respectively. Notably, the R2 value for testing in the KNN3 model exhibited a significant improvement, increasing by 44.8% compared to the KNN1 model and 36.9% compared to the KNN2 model. These results underscore the KNN3 model’s superior ability to accurately predict PM2.5 levels and outperform the other models in the evaluation.

Figure shows the results obtained for the KNN3 model. A joint plot was plotted for the predicted value versus the test value. The R2 score for testing was 0.682, and the correlation coefficient (r) between predicted and test values was 0.83.

Figure 10. Distribution plot for KNN3 model.

Figure 10. Distribution plot for KNN3 model.

In Figure , the distribution plot with a KDE function is presented, representing the distribution of errors or residuals in the modeling results. The base width of the distribution plot appears wider, suggesting a significant variance in the error values. However, despite the wider base width, the peak for zero error was observed at 0.05, which was higher than that of KNN3. This higher peak indicates that the model performed better than the KNN2 model (Figure ). Overall, these findings suggest that the model’s predictions were more accurate and closer to the actual values compared to the KNN2 model.

Figure 11. Joint plot for KNN3 model.

Figure 11. Joint plot for KNN3 model.

4.4.3. Histogram based gradient boost (HGBoost) model

Histograms conveniently illustrates the distribution of data, more precisely the number of occurrences of a data point, in case the data is repetitive. When the data fed to a model is arranged or discretised into bins as in a histogram, the flexibility of model increases. Combining histogram-based algorithm with gradient boosting constructs high performance machine learning ensembles (Nhat-Duc & Van Duc, Citation2023). HGBoost makes the algorithm to catch hold on integer-based data structures (histograms) instead of relying on sorted continuous values when building the trees. HGBoost is well-suited for capturing complex nonlinear relationships in the data. It combines gradient boosting algorithms with histogram-based techniques, allowing it to model and exploit interactions among features effectively.

Table displays the findings generated from the HGBoost model. The results show that the HGBoost2 model using only pollutant data as input had the least impressive results, with an R2 value of 0.667 during testing. This model’s error was likewise higher than the other two models, with an RMSE of 11.748 μg/m3 and a MAE of 8.807 μg/m3. During testing, the HGBoost1 model utilising meteorological parameters as input outperformed HGBoost2 with a lower error and an R2 of 0.728.

Table 8. Results obtained for HGBoost

However, the HGBoost3 model with both pollution and meteorological data as input performed the best of the three models examined. It had the lowest error of the three models, with a MAE of 5.717 μg/m3 and an RMSE of 7.647 μg/m3. During training and testing, the R2 values were 0.981 and 0.859, respectively. In comparison to the HGBoost1 and HGBoost1 models, the R2 for testing in the HGBoost3 model increased by 20% and 28.78%, respectively.

Figure illustrates the outcomes of the HGBoost3 model. A joint plot was generated to visualize the predicted values versus the test values. The model achieved an impressive R2 value of 0.859 during testing, demonstrating its strong predictive performance. Furthermore, r between predicted and test values was estimated to be 0.93, showing a strong positive association. Figure depicts the distribution of mistakes or residuals in the modelling results using a distribution plot with a KDE. The base width of the distribution plot is broader, indicating that the mistakes are variable. Notably, the peak detected for zero error is centred at 0.04, indicating that the model is accurate and has a low bias in its predictions (Figure ).

Figure 12. Distribution plot for HGBoost3 model.

Figure 12. Distribution plot for HGBoost3 model.

Figure 13. Joint plot for HGBoost3 model.

Figure 13. Joint plot for HGBoost3 model.

4.4.4. Comparison of model performance

From Figures , we can infer that the KNN3 model performed better than the MLR3 and KNN2 models. The MAE and RMSE of 9.137 μg/m3 and 12.594 μg/m3 were found to be less than the corresponding MAE and RMSE values of the MLR3 and KNN2 models. But the same does not hold when the comparison is made with respect to the HGBoost3 model. In the overall picture, we can conclude that the HGBoost3 model performed better than all the other models, including KNN3, with the lowest MAE and RMSE of 5.717 μg/m3 and 7.647 μg/m3, respectively. The MSE value 58.487 (μg/m3)2 of HGBoost3 was also found to be the least among all the other models. Hence, we can conclude that HGBoost3 model gave the least error for PM2.5 modelling.

Figure 14. MAE and RMSE for different models.

Figure 14. MAE and RMSE for different models.

Figure 15. MSE for different models.

Figure 15. MSE for different models.

The comparison of R2 values during training as well as testing, as depicted in Figure , provides a clear assessment of the models’ performance. From the figure, it is evident that the KNN3 model outperformed both the MLR3 and KNN2 models. The R2 for the KNN3 model during testing was 0.682, which surpassed the R2 values of MLR3 (0.581) and KNN2 (0.498). However, when compared to all the models, the HGBoost3 model demonstrated superior performance in both training as well as testing stages, achieving R2 of 0.981 and 0.859, respectively.

Figure 16. R2 for different models.

Figure 16. R2 for different models.

The HGBoost3 model was found to be the most successful for PM2.5 modelling based on minimal error and maximum R2 values. The spatiotemporal air quality evaluations were carried out over a two-year period, from January 2018 to December 2019, and full results were presented. The correlation analysis clarified the connections between the input parameters and the output variable. PM2.5 modelling was performed using multilinear regression, K-nearest neighbour regression, and HGBoost, and the results were compared based on error and R2 values. Finally, the best model was chosen based on its high R2 value and low error. HGBoost typically has faster training times compared to MLR and KNN, especially when dealing with large datasets. MLP requires iterative optimization techniques like backpropagation, which can be computationally intensive and time-consuming, while HGBoost’s histogram-based approach allows for efficient parallelization.

5. Summary and conclusions

For the city of Hyderabad, a spatiotemporal air quality investigation was performed from January 2018 to December 2019. Multilinear regression, KNN, and HGBoost regression models were used to model PM2.5. The findings show that for PM2.5 modelling, HGBoost regression outperforms.

  • Spatiotemporal air quality study revealed higher levels of PM2.5 and NO2 at several monitoring stations.

  • The highest measured PM2.5 concentration was 130.54 μg/m3, which exceeded the CPCB air quality regulation of 60 μg/m3.

  • Similarly, the highest NO2 level was 130.59 μg/m3, which exceeded the CPCB guideline of 80 μg/m3.

  • There were seasonal changes in PM2.5 concentrations, with average winter levels 68% greater than summer levels.

  • Positive correlations were found between PM2.5 and various pollutants, with NO2 displaying the highest correlation (Pearson coefficient of 0.54).

  • Conversely, when meteorological parameters were considered, negative correlations were observed. Speed and wind direction showed the highest negative correlations (−0.62 and −0.24, respectively).

  • Meteorological factors had a greater influence on PM2.5 modelling than pollutant data.

  • Using both meteorological and pollution data, the HGBoost3 model performed remarkably well, with R2 values of 0.981 and 0.859 during training and testing, respectively.

  • The HGBoost3 model has the lowest errors, with MAE and RMSE of 5.717 and 7.647 μg/m3, respectively.

  • As a result, the HGBoost3 model was chosen as the study’s best PM2.5 forecasting model.

The research emphasizes the urgency of addressing air pollution in Hyderabad, given its adverse impacts on public health, the environment, and overall quality of life. By employing machine learning techniques and adopting a multi-faceted approach, there is hope for positive change, leading to cleaner air and a healthier future for the residents of Hyderabad and other polluted cities worldwide. The findings indicate that air quality in the city is influenced by a combination of local emissions, meteorological conditions, and regional factors. Machine learning models have proven to be effective tools in predicting PM2.5 levels, allowing for better understanding and forecasting of air pollution episodes. The significance of this study lies in its potential to assist policymakers and environmental authorities in implementing targeted measures to tackle air pollution effectively. By identifying key contributors to PM2.5 concentrations, authorities can design more focused and impactful policies, such as stricter emission controls, improved urban planning, and public awareness campaigns. However, it is essential to acknowledge that the effectiveness of these measures depends on collaborations between different stakeholders, including government bodies, industries, and the general public. Continuous monitoring, data collection, and model refinement will be crucial to maintain the accuracy and reliability of the predictions.

Correction

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Author contributions statement

Conceptualization and supervision: AM writing review and editing: AM, GPR, PRS, AKS and HGA; data curation and formal analysis: GPR, AM, AKS and PRS; evidence collection, review, and editing: GPR, AM, PRS, HGA, HA and AAA.

Availability of data and material

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors express their gratitude to the editor and anonymous reviewers for their valuable and insightful feedback, which significantly enhanced the quality of this paper. Furthermore, the authors extend their thanks to the CPCB (Central Pollution Control Board) for providing the air pollution data via its online CAAQMS website, enabling the research and analysis presented in this study. We are also thankful to the editors and potential reviewers.

Disclosure statement

On behalf of all authors, the corresponding author states that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Analitis, A., Barratt, B., Green, D., Beddows, A., Samoli, E., Schwartz, J., & Katsouyanni, K. (2020). Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. Atmospheric Environment, 240(June), 117757. https://doi.org/10.1016/j.atmosenv.2020.117757
  • Athanasiadis, I. N., Kaburlasos, V. G., Mitkas, P. A., & Petridis, V. (2003) Applying machine learning techniques on air quality data for real-time decision support. In: First international NAISO symposium on Information Technologies in Environmental Engineering (ITEE’2003), Gdansk, Poland.
  • Baccarelli, A. A., Zheng, Y., Zhang, X., Chang, D., Liu, L., Wolf, K. R., Zhang, Z., McCracken, J. P., Díaz, A., Bertazzi, P. A., Schwartz, J., Wang, S., Kang, C.-M., Koutrakis, P., & Hou, L. (2014). Air pollution exposure and lung function in highly exposed subjects in Beijing,China: A repeated-measure study. Particle and Fibre Toxicology, 11(1), 51. https://doi.org/10.1186/s12989-014-0051-7
  • Caselli, M., Trizio, L., de Gennaro, G., & Ielpo, P. (2009). A simple feedforward neural network for the pm 10 forecasting: Comparison with a radial basis function network and a multivariate linear regression model. Water, Air, and Soil Pollution, 201(1–4), 365. https://doi.org/10.1007/s11270-008-9950-2
  • Chaloulakou, A., Kassomenos, P., Spyrellis, N., Demokritou, P., & Koutrakis, P. (2003). Measurements of pm10 and PM2.5 particle concentrations in Athens, Greece. Atmospheric Environment, 37(5), 649. https://doi.org/10.1016/S1352-2310(02)00898-1
  • Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K., Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A., Kan, H., Knibbs, L., Liu, Y., Martin, R. & Murray, C. J. L. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases study 2015. Lancet, 389(10082), 1907–23. https://doi.org/10.1016/S0140-6736(17)30505-6
  • Das, K., Chatterjee, N. D., Jana, D., & Bhattacharya, R. J. (2023). Application of land-use regression model with regularization algorithm to assess PM2.5 and PM10 concentration and health risk in Kolkata Metropolitan. Urban Climate, 49, 101473. https://doi.org/10.1016/j.uclim.2023.101473
  • De Vito, L., Chatterton, T., Namdeo, A., Nagendra, S., Gulia, S., Goyal, S., Bell, M., Goodman, P., Longhurst, J., Hayes, E., Kumar, R., Sethi, V., Sengupta, B., Ramadurai, G., Majmuder, S., Menon, J. S., Turamari, M. N., & Barnes, J. (2018). Air pollution in Delhi: A review of past and current policy approaches. WIT Transactions on Ecology and the Environment, 230, 441–451. https://doi.org/10.2495/AIR180411
  • Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., & Wang, J. (2015). Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, 107, 118–128. https://doi.org/10.1016/j.atmosenv.2015.02.030
  • Goudarzi, G., Hopke, P. K., & Yazdani, M. (2021a). Forecasting PM2.5 concentration using artificial neural network and its health effects in Ahvaz. Iran, Chemosphere, 283, 131285. https://doi.org/10.1016/j.chemosphere.2021.131285
  • Goudarzi, G., Hopke, P. K., & Yazdani, M. (2021b). Forecasting PM2.5 concentration using artificial neural network and its health effects in Ahvaz, Iran. Chemosphere, 283(June), 131285. https://doi.org/10.1016/j.chemosphere.2021.131285
  • Gregorio, J., Gouveia-Caridade, C., & Caridade, P. J. S. B. (2022). Modeling PM2.5 and PM10 using a Robust simplified linear regression machine learning algorithm. Atmosphere, 13(8), 1334. https://doi.org/10.3390/atmos13081334
  • Grell, G., Peckham, S., Schmitz, R., McKeen, S., Frost, G., Skamarock, W., & Eder, B. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975. https://doi.org/10.1016/j.atmosenv.2005.04.027
  • Gronlund, C. J., Humbert, S., Shaked, S., O’Neill, M. S., & Jolliet, O. (2015). Characterizing the burden of disease of particulate matter for life cycle impact assessment. Air Quality, Atmosphere, & Health, 8(1), 29e46. https://doi.org/10.1007/s11869-014-0283-6
  • Gulia, S., Nagendra, S. M. S., Barnes, J., & Khare, M. (2018). Urban local air quality management framework for non-attainment areas in Indian cities. Science of the Total Environment–1318.
  • Huang, Y., Wu, Y., & Zhang, W. (2020). Comprehensive identification and isolation policies have effectively suppressed the spread of COVID-19. Chaos, Solitons& Fractals, 139, 110041. https://doi.org/10.1016/j.chaos.2020.110041
  • IQAir, 2020. World air quality report. 2020 World Air Qual. Rep. 1–35.
  • Jung, C.-R., Hwang, B.-F., & Chen, W.-T. (2015). Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2. 5 concentrations in Taiwan from 2005 to 2015. Environmental Pollution, 237(1), 1000–1010. https://doi.org/10.1016/j.envpol.2017.11.016
  • Kapoor, M. (2017). Managing ambient air quality using ornamental plants-an alternative approach. Universal Journal of Plant Science, 5(1), 1–9. https://doi.org/10.13189/ujps.2017.050101
  • Kim, E., Park, H., Hong, Y. C., Ha, M., Kim, Y., Kim, B. N., Kim, Y., Roh, Y. M., Lee, B. E., Ryu, J. M., Kim, B. M., & Ha, E. H. (2014). Prenatal exposure to PM10 and NO2 and children’s neurodevelopment from birth to 2 months of age: Mothers and Children’s Environmental Health (MOCEH) study. Science of the Total Environment, 481, 439–445. https://doi.org/10.1016/j.scitotenv.2014.01.107
  • Kumar, A., & Gurjar, B. R. (2019). Low-cost sensors for air quality monitoring in developing countries – A critical view. Asian Journal of Water, Environment & Pollution, 16(2), 65–70. https://doi.org/10.3233/AJW190021
  • Kumar, S., Mishra, S., & Singh, S. K. (2020). A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon, 6(11), e05618. https://doi.org/10.1016/j.heliyon.2020.e05618
  • Lancet. (2020). Health and economic impact of air pollution in the states of India: The global burden of disease study 2019. The Lancet Planetary Health, 5(1), e25–e38. https://doi.org/10.1016/S2542-5196(20)30298-9
  • Le, V., Bui, C. T., & Cha, S. (2020). Spatiotemporal deep learning model for citywide air pollution interpolation and prediction. Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, 55–62. IEEE. https://doi.org/10.1109/BigComp48618.2020.00-99
  • Li, X., Peng, L., Hu, Y., Shao, J., & Chi, T. (2016). Deep learning architecture for air quality predictions. Environmental Science and Pollution Research, 23(22), 408–422. https://doi.org/10.1007/s11356-016-7812-9
  • Li, X., Song, H., Zhai, S., Lu, S., Kong, Y., Xia, H., & Zhao, H. (2019). Particulate matter pollution in Chinese cities: Areal-temporal variations and their relationships with meteorological conditions (2015–2017). Environmental Pollution, 246, 11–18. https://doi.org/10.1016/j.envpol.2018.11.103
  • Liu, H., Yan, G., Duan, Z., & Chen, C. (2021). Intelligent modeling strategies for forecasting air quality time series: A review. Applied Soft Computing, 102, 106957. https://doi.org/10.1016/j.asoc.2020.106957
  • Ma, X., Chen, T., Ge, R., Cui, C., Xu, F., & Lv, Q. (2022). Time series-based PM2.5 concentration prediction in Jing-Jin-Ji area using machine learning algorithm models. Heliyon, 8(9), e10691. https://doi.org/10.1016/j.heliyon.2022.e10691
  • Nhat-Duc, H., & Van Duc, T. (2023). Comparison of histogram-based gradient boosting classification machine, random Forest, and deep convolutional neural network for pavement raveling severity classification. Automation in Construction, 148, 10476. https://doi.org/10.1016/j.autcon.2023.104767
  • Ni, X. Y., Huang, H., & Du, W. P. (2017). Relevance analysis and short-term prediction of PM2. 5 concentrations in Beijing based on multisource data. Atmospheric Environment, 150, 146–161. https://doi.org/10.1016/j.atmosenv.2016.11.054
  • Niska, H., Hiltunen, T., Karppinen, A., Ruuskanen, J., & Kolehmainen, M. (2004). Evolving the neural network model for forecasting air pollution time series. Engineering Applications of Artificial Intelligence, 17(2), 159. https://doi.org/10.1016/j.engappai.2004.02.002
  • Raju, L., Gandhimathi, R., Mathew, A., & Ramesh, S. T. (2022). Spatio-temporal modelling of particulate matter concentrations using satellite derived aerosol optical depth over coastal region of Chennai in India. Ecological Informatics, 69, 101681. https://doi.org/10.1016/j.ecoinf.2022.101681
  • Roychowdhury, A., & Somvanshi, A. (2020). Breathing space; How to track and report air pollution under the national clean air programme, center for science and environment.
  • Siew, Y. L., Ying Chin, L., Mah, P., & Wee, J. (2008). ARIMA and integrated ARFIMA models for forecasting air pollution index in shah alam. The Malaysian Journal of Analytical Sciences, 12.
  • Silva, R., West, J., Zhang, Y., Anenberg, S., Lamarque, J.-F. T., Shindell, D., Collins, W., Dalsøren, S., Faluvegi, G., Folberth, G., Horowitz, L., Nagashima, T., Naik, V., Rumbold, S., Skeie, R., Sudo, K., Takemura, T., Bergmann, D. & Szopa, S. (2013). Global premature mortality due to anthropogenic outdoor air pollution and the contribution of past climate change. Environmental Research Letters, 8(3), 2. https://doi.org/10.1088/1748-9326/8/3/034005
  • Singh, P., Gupta, K., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023
  • Tsai, Y. T., Zeng, Y. R., & Chang, Y. S. (2018). Air pollution forecasting using rnn with lstm. Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), Athens, Greece (pp. 1074–1079). https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00178.
  • Varaprasad, V., Kanawade, V. P., & Narayana, A. C. (2021). Spatio-temporal variability of near-surface air pollutants at four distinct geographical locations in Andhra Pradesh State of India. Environmental Pollution, 268, 115899. https://doi.org/10.1016/j.envpol.2020.115899
  • Wang, Z., Wei, W., & Zheng, F. (2020). Effects of industrial air pollution on the technical efficiency of agricultural production: Evidence from china. Environmental Impact Assessment Review, 83, 106407. https://doi.org/10.1016/j.eiar.2020.106407
  • WHO. (2021). WHO “Global Air Quality Guidelines”. Organization. https://apps.who.int/iris/handle/10665/345329
  • World Health Organization. (2014). World Health Statistics. World Health Organization. Retrieved September 20, 2021, from https://apps.who.int/iris/bitstream/handle/10665/112738/9789240692671_eng.pdf;jsessionid=F52821DBF1D54D49FB35C09515956A25?sequence=1.
  • Zhang, P., Ma, W., Wen, F., Liu, L., Yang, L., Song, J., Wang, N., & Liu, Q. (2021). Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China. Ecotoxicology and Environmental Safety, 225, 112772. https://doi.org/10.1016/j.ecoenv.2021.112772