Search in:

Cogent Engineering Volume 10, 2023 - Issue 1

Submit an article Journal homepage

Open access

2,405

Views

CrossRef citations to date

Altmetric

Listen

Civil & Environmental Engineering

Air quality analysis and PM_2.5 modelling using machine learning techniques: A study of Hyderabad city in India

Aneesh Mathew1 Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, IndiaView further author information

P R Gokul1 Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, IndiaView further author information

Padala Raja Shekar1 Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, IndiaView further author information

K. S. Arunab1 Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, IndiaView further author information

Hazem Ghassan Abdo2 Geography Department, Faculty of Arts and Humanities, Tartous University, Tartous, SyriaCorrespondence[email protected]

https://orcid.org/0000-0001-9283-3947 View further author information

Hussein Almohamad3 Department of Geography, College of Arabic Language and Social Studies, Qassim University, Buraydah, Saudi ArabiaView further author information

Ahmed Abdullah Al Dughairi3 Department of Geography, College of Arabic Language and Social Studies, Qassim University, Buraydah, Saudi ArabiaView further author information

show all

Article: 2243743 | Received 08 Feb 2023, Accepted 29 Jul 2023, Published online: 13 Aug 2023

Cite this article
https://doi.org/10.1080/23311916.2023.2243743
CrossMark

In this article

Abstract
1. Introduction
2. Study area
3. Data and methodology
4. Results and discussion
5. Summary and conclusions
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The rapid urbanization and industrialization in many parts of the world have made air pollution a global public health problem. A study conducted by the Swiss organization IQAir indicated that 22 of the top 30 most polluted cities in the world are in India. This creates the problem of air pollution, which is very relevant to India as well. Exposure to air pollutants has both acute (short-term) and chronic (long-term) impacts on health. Among the major air pollutants, particulate matter 2.5 (PM_2.5) is the most harmful, and its long-term exposure can impair lung functions. Pollutant concentrations vary temporally and are dependent on the local meteorology and emissions at a given geographic location. PM_2.5 forecasting models have the potential to develop strategies for evaluating and alerting the public regarding expected hazardous levels of air pollution. Accurate measurement and forecasting of pollutant concentrations are critical for assessing air quality and making informed strategic decisions. Recently, data-driven machine learning algorithms for PM_2.5 forecasting have received a lot of attention. In this work, a spatio-temporal analysis of air quality was first performed for Hyderabad, indicating that average PM_2.5 concentrations during the winter were 68% higher than those during the summer. Following that, PM_2.5 modelling was done using three different techniques: multilinear regression, K-nearest neighbours (KNN), and histogram-based gradient boost (HGBoost). Among these, the HGBoost regression model, which used both pollution and meteorological data as inputs, outperformed the other two techniques. During testing, the model acquired an amazing R² value of 0.859, suggesting a significant connection with the actual data. Additionally, the model exhibited a minimum Mean Absolute Error (MAE) of 5.717 μg/m³ and a Root Mean Square Error (RMSE) of 7.647 μg/m³, further confirming its accuracy in predicting PM_2.5 concentrations. In our investigation, we discovered that the HGBoost3 model beat other PM_2.5 modelling models by having the lowest error and the highest R² value. This study made a substantial addition by incorporating the spatiotemporal relationship between air pollutants and meteorological variables in predicting air quality. This method has the potential to improve the creation of more precise air pollution forecast models.

Keywords:

air pollution
seasonal variation
forecasting
PM2.5 modelling
environment
HGBoost regression

1. Introduction

Air, being a vital element for sustaining life on Earth, is facing challenges due to both human-made and natural factors. Industrialization, volcanic eruptions, forest fires, agricultural burning, and urbanization, among others, have collectively contributed to a decline in air quality worldwide (Raju et al., Citation2022). Shockingly, approximately eighty percent of global cities and ninety-eight percent of cities in middle-income nations surpass recommended air quality standards. This escalation in air pollution leads to detrimental consequences, accelerated climate change resulting in extreme weather events, including economic losses, reduced visibility, and millions of premature deaths annually (Jung et al., Citation2015). The main air pollutants include particulate matter, carbon dioxide, carbon monoxide, nitrogen oxides, sulfur oxides, and volatile organic compounds. Among these, anthropogenic fine particulate matter (PM_2.5) poses a significant threat to air quality, representing particles with an aerodynamic diameter less than 2.5 micrometres (Gregorio et al., Citation2022). Accurate forecasting of air pollutants and identifying pollution trends can aid scientists in devising effective emission control strategies. India has been grappling with air pollution for over a century, and the situation has worsened in recent decades due to rapid population growth, unplanned urbanization, and industrialization (Kapoor, Citation2017). Notably, 22 of the world’s 30 most polluted cities are in India (IQAir, Citation2020), and from 2008 to 2013, India ranked among the most polluted countries globally, according to the World Health Organization’s database (World Health Organization [WHO], Citation2014). Ambient air pollution in India is a significant contributor to approximately 17.8% of all fatalities, as reported by the Global Burden of Illness study (Lancet, Citation2020). Among the major factors responsible for these deaths are ambient particulate matter (PM) and residential air pollution.

Over the past two decades, India has witnessed some of the most severe and widespread air quality degradation, making air pollution a critical concern for regulatory authorities. The post-monsoon season, particularly in the Indo Gangetic Plains (IGP), becomes highly susceptible to severe pollution episodes. An alarming example occurred during a week in early November 2017, when PM_2.5 particle levels in Delhi surpassed WHO regulations by 25 times (11 times higher than Indian standards), triggering an environmental health emergency (De Vito et al., Citation2018). Recognizing the gravity of the situation, the WHO updated its air quality guidelines, establishing revised annual and 24-hour requirements for six major pollutants (WHO, Citation2021). For instance, the yearly average standards for PM₁₀ and PM_2.5 were reduced from 20 g/m3 to 15 g/m3 and from 10 g/m3 to 5 g/m3, respectively. Despite technological advancements aimed at improving air quality, significant transformative changes are yet to be fully realized (Gulia et al., Citation2018). The issue of air pollution continues to demand urgent attention and comprehensive efforts to safeguard public health and the environment.

The introduction of real-time monitoring stations in India heralded a significant technological advancement in the field of air quality monitoring. In 2006, these stations were initially set up in Delhi as a pilot project and were later expanded to numerous other cities after 2016 (Roychowdhury & Somvanshi, Citation2020). Over time, the number of Continuous Ambient Air Quality Monitoring (CAAQM) stations across the country has increased to 278, serving 147 cities. Additionally, India established the System of Air Quality and Weather Forecasting and Research (SAFAR) network, which combines manual and real-time monitoring with air quality forecasting capabilities. Initially implemented by the Indian Institute of Tropical Meteorology (IITM) in Pune for Delhi, SAFAR has now expanded its forecasting services to three other cities: Pune, Mumbai, and Ahmedabad (http://safar.tropmet.res.in/). These forecasting models have proven valuable in guiding policymakers to make informed strategic decisions. While a significant portion of the country’s air quality monitoring spectrum is covered by manual and real-time monitoring stations, researchers are actively exploring the use of low-cost sensors to measure air quality on a smaller spatiotemporal scale (A. Kumar & Gurjar, Citation2019). This approach holds promise for enhancing the granularity and coverage of air quality data, allowing for more detailed insights into local air pollution patterns.

The rising number of premature deaths associated with air pollution has drawn significant attention to the health impacts of PM_2.5 (Particulate Matter with a diameter of 2.5 micrometers or less) (Cohen et al., Citation2017; Silva et al., Citation2013). These tiny particles possess the ability to deeply penetrate the respiratory tract and reach the lungs, giving rise to respiratory and cardiovascular ailments (Gronlund et al., Citation2015). Exposure to PM_2.5 can cause irritation in the eyes, nose, throat, and lungs, leading to symptoms such as coughing, sneezing, runny nose, and shortness of breath. Furthermore, PM_2.5 exposure can severely impact respiratory function and worsen medical conditions such as asthma and heart disease. Studies have revealed that increased daily exposure to PM_2.5 is associated with a higher number of respiratory and cardiovascular hospital admissions, emergency department visits, and mortality rates (Baccarelli et al., Citation2014). Long-term exposure to fine particulate matter has also been linked to an elevated risk of chronic bronchitis, decreased lung function, and mortality from lung cancer and heart disease. Particularly vulnerable to the harmful effects of PM_2.5 are individuals with pre-existing lung and cardiac issues, as well as children and the elderly.

Numerous studies have investigated the spatial and temporal variations of pollutants (Analitis et al., Citation2020; Athanasiadis et al., Citation2003; Chaloulakou et al., Citation2003). Varaprasad et al. (Citation2021) conducted a study focusing on PM_2.5, Carbon Monoxide (CO), NOx, and sulphur dioxide (SO2), and observed distinct seasonal fluctuations for each pollutant. In the study area, PM_2.5 and PM₁₀ concentrations had a notable impact on air quality, with PM_2.5 mass concentrations being higher during the post-monsoon season (Das et al., Citation2023; Li et al., Citation2019). The study also found that PM_2.5 concentrations varied significantly during the day. Additionally, regional disparities were identified during the investigation. Furthermore, Li et al. (Citation2019) explored the correlation between meteorological factors and PM_2.5 concentration, noting that precipitation, relative humidity, air temperature, and wind speed showed a negative relationship with PM_2.5 concentration. These findings shed light on the complex interactions between air pollutants and meteorological conditions, contributing to a better understanding of air quality variations.

In light of the study conducted by Singh et al. (Citation2013), which established a significant correlation between daily death rates and air pollution statistics, the researchers found that PM_2.5 (particulate matter with a diameter of 2.5 micrometers or smaller) was particularly dangerous. This was attributed to its ability to penetrate the lung walls, causing severe health issues and potentially leading to increased mortality rates in regions with higher levels of this pollutant. To address the complexities of predicting future pollution levels, S. Kumar et al. (Citation2020) took a different approach. They employed general statistical methodologies, including multiple linear regression, to develop models capable of forecasting pollution levels over time. These models were then applied in another study, where researchers aimed to identify correlations between various characteristics and pollution patterns. The primary finding of this subsequent study was the successful identification of correlations between certain characteristics and air pollution patterns. However, it became evident that predicting various elements of time series data, such as trends, seasonality, and outliers, presented considerable challenges. The studies relied predominantly on simple statistical models, which proved inadequate in capturing the intricacies and nuances of pollution trends. The main limitations that hindered the accurate forecast of air pollution patterns were twofold. First, these simple statistical models lacked the ability to handle complex interdependencies and correlations between different variables that influence pollution levels. Air pollution is influenced by a multitude of factors, including industrial activities, traffic, weather conditions, and geographical features, making it a complex and multifaceted problem to model accurately. Second, the models faced difficulties in capturing seasonal variations and long-term trends. Air pollution patterns often exhibit seasonal fluctuations due to variations in weather conditions, human activities, and natural phenomena. These seasonal variations are crucial to understanding pollution levels and predicting future trends accurately. Furthermore, capturing long-term trends and potential outliers in pollution data is essential for making informed policy decisions and implementing effective mitigation strategies.

The importance of accurately modeling and anticipating air quality cannot be overstated, as it enables the public to be aware of potential health risks and empowers them to take precautionary measures. In recent years, machine learning approaches have gained popularity for forecasting temporal sequences of pollutants, and their application to air quality forecasting has been on the rise (Le et al., Citation2020). Forecasting models play a crucial role in developing effective strategies for assessing and informing the public about potential spikes in the air quality index (Zhang et al., Citation2021). These models generally fall into two main categories: simulation-based and data-driven approaches, each utilizing different methods to predict air pollution concentrations. The simulation-based approach integrates physical and chemical models to simulate the emission, transport, and chemical transformation of air pollutants. This method takes into account various factors such as emissions from different sources, meteorological conditions, and background characteristics to generate forecasts (Grell et al., Citation2005). While this approach can provide valuable insights, it does face certain challenges. One of the primary challenges is the presence of uncertainties in numerical models, which can impact the accuracy of predictions. Additionally, a lack of sufficient data on certain parameters can limit the precision of simulation-based models. On the other hand, data-driven approaches leverage statistical and machine learning techniques to identify patterns (Li et al., Citation2016). This approach proves to be effective, especially when dealing with high-dimensional data, as machine learning algorithms can efficiently discover relevant exposures that are related to desired health outcomes. Data-driven approaches are particularly useful when dealing with complex air pollution patterns influenced by numerous factors, as they can adapt and learn from the available data to make predictions (Ma et al., Citation2022).

Data-driven machine learning technologies have revolutionized the way researchers investigate the influence of various air contaminants on health outcomes (Caselli et al., Citation2009; Goudarzi et al., Citation2021a; Liu et al., Citation2021; Tsai et al., Citation2018; Ni et al., Citation2017; Niska et al., Citation2004; Siew et al., Citation2008). These advanced methodologies enable them to analyze and interpret complex data sets, considering multiple air pollutants simultaneously and their potential impact on human health. One critical area of research has been focused on early-life exposure to ambient air pollution and its effects on children’s neurodevelopment. Studies, such as the one conducted by E. Kim et al. (Citation2014), have provided mounting evidence that exposure to air pollution during early developmental stages may have adverse effects on cognitive development and neurobehavioral outcomes in children. This research highlights the importance of understanding the long-term consequences of air pollution exposure during critical periods of brain development and emphasizes the need for implementing measures to protect vulnerable populations, particularly children, from harmful air pollutants. Additionally, the implications of air pollution extend beyond neurological effects. A noteworthy study by Huang et al. (Citation2020) identified air pollution as a risk factor for obesity, particularly among individuals with a higher body mass index (BMI). This finding suggests that air pollution may have a broader impact on metabolic health, raising concerns about its contribution to the obesity epidemic. Further exploration into the relationship between air pollution and various health conditions, including obesity, is essential for formulating effective public health policies and interventions.

Air pollution’s far-reaching consequences extend beyond the realm of public health and also encompass detrimental effects on several industries, most notably agriculture. In China, extensive studies have revealed the significant impact of industrial air pollution on agricultural productivity. As a result, the agricultural sector experiences reduced marginal products, while various critical parameters like labor-capital dynamics undergo alterations (Wang et al., Citation2020). This intersection between air pollution and agricultural productivity inevitably ripples into broader economic implications for a country. The negative influence on agricultural output can hamper food production and supply, potentially leading to food shortages and increased prices. Moreover, decreased agricultural productivity may lead to reduced exports and hinder the overall growth of the economy. Additionally, air pollution’s effect on agriculture can disrupt rural livelihoods, forcing communities to cope with environmental challenges that affect their socio-economic well-being. For instance, farmers may face financial burdens due to decreased crop yields, exacerbating poverty and inequality. The economic impact of air pollution is not limited to agriculture alone; it also extends to other sectors. For instance, manufacturing and industrial activities might suffer from decreased productivity and increased costs due to air quality regulations and health-related absences among workers. Moreover, the healthcare sector experiences a surge in the demand for medical services, placing a strain on the healthcare system and draining resources that could be allocated elsewhere for development. In sum, the detrimental effects of air pollution on agriculture and various industries contribute to a vicious cycle that hinders a country’s overall economic development. Addressing air pollution becomes a crucial priority for sustainable economic growth, improved public health, and the well-being of communities and industries alike.

Due to exponential growth in both urbanization and industrialization, India has become highly vulnerable to atmospheric pollution in recent years, particularly in urban areas. The increasing level of pollutants in the atmosphere worsened the ambient air quality. The Air Quality Index has been increasing at an alarming rate in major Indian cities. This made us focus on the air quality of major Indian cities. Therefore, necessary steps need to be taken to overcome this critical issue. Pollutant forecasting and the discovery of various patterns in air pollution will improve the scientific knowledge required for the development of an optimal emission control strategy. To bring long term solutions for the problem of air pollution, the right strategic decisions must be taken, and this is possible only if there is an accurate air quality measurement and forecasting system in place. Machine learning-based PM_2.5 forecasting models offer the potential to develop methods for evaluating and warning the public about potentially harmful levels of air pollution (X. Feng et al., Citation2015; Goudarzi et al., Citation2021b).

The current study has multiple goals: (a) conduct spatiotemporal air quality analysis and explore seasonal changes in PM_2.5 levels. (b) Conduct a correlation analysis. (c) Examine the relationship between input and output variables. Furthermore, the project intends to utilise several machine learning techniques for PM_2.5 modelling and to establish the best PM_2.5 forecasting model.

2. Study area

The city of Hyderabad will be the focus of the spatio-temporal analysis and PM_2.5 modeling in this study. As a fast-growing global city, Hyderabad’s air quality has deteriorated significantly over the last decade, owing mostly to increased traffic and the presence of numerous industries in its northern and eastern sectors. Despite being an important urban center, the literature review revealed a scarcity of studies on the air quality of Hyderabad. Furthermore, the observed PM_2.5 levels in the city have consistently exceeded the prescribed limits set by the Central Pollution Control Board (CPCB). Considering these factors, Hyderabad was chosen as the ideal study area for this project.

2.1. Hyderabad

The city experiences a hot semi-arid climate, characterized by distinct weather patterns. During the months of June to October, Hyderabad receives substantial rainfall due to the influence of the southwest monsoon, which contributes significantly to its overall precipitation. The mean annual temperature hovers around 26.6 °C, with monthly average temperatures ranging from 21–33 °C. May stands out as the hottest month, with temperatures soaring as high as 36–39°C, while the coolest period occurs from December to January, with temperatures ranging from 14.5–28 °C. Hyderabad boasts a diverse economy, fueled by key industries such as information, pharmaceuticals, drugs and technology, manufacturing, food, and hospitality. The city’s prominence as a major IT hub has earned it the moniker “Cyberabad,” with numerous IT parks and multinational corporations operating within its boundaries. The pharmaceutical and biotechnology sectors have also flourished in Hyderabad, housing headquarters and manufacturing units of several prominent companies. Additionally, the manufacturing, hospitality, and food industries contribute significantly to the city’s economic growth.

However, alongside the positive aspects of its economy, Hyderabad faces the challenge of vehicular emissions as a significant contributor to pollution. The increasing number of vehicles, coupled with traffic congestion, contributes to air pollution in the city. Efforts are being made to address this issue, including the promotion of public transportation, the encouragement of electric vehicles, and the implementation of stricter emission norms for industries. These measures aim to improve air quality and reduce pollution levels, ensuring a sustainable and healthier environment for the residents of Hyderabad. Figure represents the geographical location of Hyderabad district and six monitoring stations.

Figure 1. (a) Map of Hyderabad (b) the locations of Hyderabad’s six Monitoring Stations (MS).

3. Data and methodology

Data collection and data pre-processing will be the initial step. Spatio-temporal air quality analysis of the study area needs to be considered before modeling to understand the trend of air pollution in the study area. PM_2.5 modelling will be conducted out using several machine learning models, and performance will be evaluated. Figure depicts the general methods used for PM_2.5 modelling.

Figure 2. General methodology for PM_2.5 modelling.

The research begins with meticulous data collection. Subsequently, the collected data undergoes a crucial data pre-processing step to prepare it for modeling. This pre-processing process comprises four stages, which include data integration, data cleaning, organization, checking for missing data and outliers, and finally, preparing distinct training, testing, and validation datasets. After completing the data pre-processing step, a correlation analysis is conducted to reveal the relationships between the input and output variables. This analysis helps identify the key factors influencing PM_2.5 levels in the study area. Next, a comprehensive spatio-temporal air quality analysis is undertaken to explore the variation of pollutant levels during the study period. The spatio-temporal analysis of air quality involves studying and evaluating air pollution levels across geographical locations and time. This analysis is crucial for comprehending the distribution, trends, and patterns of air pollutants in different areas and their fluctuations over time periods. By identifying pollution hotspots, seasonal variations, and long-term trends, this research becomes vital in formulating effective air quality management strategies and policies. In this study, we utilized geographic information systems (GIS), statistical methods, and machine learning techniques to process and analyze the extensive datasets involved. This multidisciplinary approach allows us to gain a comprehensive understanding of air quality dynamics and enables the development of efficient measures to enhance overall air quality. This analysis offers valuable insights into spatial patterns and seasonal trends of air pollution, enhancing our understanding of the overall air quality dynamics. For PM_2.5 modeling, three powerful machine learning algorithms are utilized: Multi linear regression, KNN regression, and HGBoost regression. These algorithms are chosen for their capability to capture complex relationships between variables and provide accurate predictions. To assess the performance of each model, various evaluation metrics are employed, such as MAE, RMSE, and R². These metrics help gauge how well the models are able to predict PM_2.5 levels. Finally, a rigorous model comparison is conducted to determine the best PM_2.5 forecasting model for the specific study area. The selected model will serve as a crucial tool for assessing and managing air quality in the city of Hyderabad, assisting authorities in implementing effective strategies and measures to safeguard public health from air pollution.

3.1. Data collection

Air pollutant and meteorological data were gathered from the all India CAAQMS portal, which is administered and operated by the Central Pollution Control Board of India (CPCB). The data collection was conducted at six Continuous Ambient Air Quality Monitoring Stations (CAAQMS) situated in Hyderabad city. The 24-hr meteorological and pollutant data were obtained for a period spanning from January 2018 to December 2019 from six CAAQMS managed by the Central Pollution Control Board (CPCB) of India in Hyderabad (CPCB).

3.1.1. Data pre-processing

Prior to initiating any modeling process, data pre-processing is a critical step to prepare the input data for machine learning models. This step involves removing inconsistent data, handling null or missing values, and addressing outliers that may disrupt the modeling process. Null or missing values in the data are identified and removed to ensure the quality and accuracy of the dataset. Outliers, which can arise from faulty readings or exceptional events, such as forest fires or religious gatherings, are also addressed, as they can significantly impact pollutant levels. The MLR, KNN, and the HGBoost regression models are used in this work. A variety of metrics are used to evaluate and compare their performance. R², MAE, RMSE, and MSE are employed as evaluation metrics to assess the models’ accuracy and effectiveness. Through comprehensive evaluation and comparison of these models using appropriate metrics, the study aims to identify the most suitable PM_2.5 forecasting model for the given study area. This will aid in making informed decisions to manage and mitigate air pollution effectively.

4. Results and discussion

4.1. General

During the span of two years, from January 2018 to December 2019, a comprehensive spatio and temporal air quality analysis was performed. This analysis aimed to understand the variations in air pollutant levels across different locations and time periods. Additionally, a correlation analysis was conducted using the collected data to unveil the relationships between various variables. For PM_2.5 modeling, three different regression techniques were employed: multilinear regression, K-nearest neighbor regression, and HGBoost regression. These models were utilized to predict PM_2.5 levels based on the available data. Following the modeling process, the results obtained from the three models were meticulously compared.

4.2. Spatio-temporal air quality analysis

Figure depicts a box plot depicting the change of various pollutants for each of the Monitoring Stations (MS) over the time period from January 2018 to December 2019. PM_2.5, NO₂, SO₂, CO, and ozone were analyzed for all 6 stations. The box plot is plotted using the daily average data for all the pollutants. The box plot for CO was plotted separately due to the difference in units. Table provides the air quality standards set by the CPCB for each of the pollutants to achieve satisfactory conditions.

Figure 3. Variation of pollutants for different stations over the period from January 2018 to December 2019.

Table 1. Pollutant CPCB criteria for acceptable conditions

Download CSV Display Table

The air quality standard for PM_2.5 set by the CPCB for satisfactory conditions is 60 μg/m³. The average daily concentration of PM_2.5 was found to be exceeding the CPCB limits in all the stations. Most of this higher concentration of PM_2.5 has been observed during the winter season. During the two-year study period, 246 days had an average PM_2.5 concentration that exceeded the CPCB threshold of 60 μg/m³. On 6 November 2019, the highest PM_2.5 concentration for MS6 was 130.54 μg/m³. This was the highest PM_2.5 concentration obtained during the period of study. The median value of 52.04 μg/m³ obtained for MS5 indicates that the PM_2.5 pollution level in MS5 is relatively on the higher side when compared with other stations (Table ).

Table 2. Variation in PM_2.5 levels among stations

Download CSV Display Table

Upon conducting a more detailed analysis of Figure , it becomes evident that Monitoring Station 1 (MS1) and MS5 exhibit significantly higher NO₂ concentrations compared to the other monitoring sites. At MS1, the mean average daily concentration of NO₂ is recorded as 48.62 g/m3, with the maximum value observed during the monitoring period reaching 103.42 μg/m³. This maximum concentration surpasses the CPCB limit of 80 μg/m³ by a substantial 29%. Meanwhile, at MS5, the daily average concentration of NO₂ is 48.11 μg/m³, with a striking peak of 130.59 μg/m³ recorded on 9 February 2019. This maximum concentration represents a noteworthy 62% increase over the CPCB limit for NO₂. It’s worth noting that this particular value stands as the highest NO₂ concentration reported during the entire monitoring duration.

Figure is a box plot depicting the fluctuation of CO levels across stations. CO concentrations were found to be greater at MS5 station, as was the case with NO₂. The average CO content observed at MS5 was 0.75 mg/m³. Although this figure is lower than the CPCB-regulated standard of 2 mg/m³, it is important to note that the CO concentration is highly variable, as seen by a standard deviation of 0.37 mg/m³. Figure displays the locations of air quality monitoring stations (MS) 1–6. Notably, MS5 (Sanathnagar MS) is located at a heavily used crossroads, which explains for the increased CO and NO₂ values at this station. These pollutants are mostly released by traffic-related fuel burning, which explains the higher amounts detected at MS5. The concentrations of SO₂ and ozone at all sites remain within the CPCB’s acceptable limits of 80 μg/m³ and 100 μg/m³, respectively. Figure depicts the temporal variation of PM_2.5 over a 2-year period using a box plot for the summer and winter seasons. This illustration depicts the seasonal changes in PM_2.5 levels over the chosen time period.

Figure 4. CO variations at various stations.

Figure 5. Seasonal variation of PM_2.5.

The seasonal pattern of PM_2.5 levels, as depicted in Figure , clearly illustrates a significant rise in pollutant concentration during the winter season. This sharp increase can be attributed to the inversion effect in the atmosphere, which is prevalent during the winter months. Across all monitoring stations, an average increase of 68% in PM_2.5 concentration has been observed during the winter season. The maximum increase of 82% was recorded at MS2, while the minimum change of 52% was observed at MS3. MS6 had the highest average winter PM_2.5 concentration of 69.12 μg/m³, while MS4 had the lowest average concentration of 30.69 μg/m³ during the summer season. Table provides a detailed summary of the variation in average PM_2.5 concentrations across all sites during the winter and summer seasons.

Table 3. Seasonal variation in P _2.5 across different stations

Download CSV Display Table

4.3. Correlation analysis

A correlation analysis was done on the acquired data to better understand the link between the input as well as output characteristics. This investigation was carried out independently for meteorological factors and contaminants. Table contains specific information about the climatic parameters used in the study.

Table 4. Information on the meteorological input variables

Download CSV Display Table

A correlation analysis was conducted to examine the relationship between meteorological parameters and PM_2.5 levels. The outcomes of this analysis are presented in the form of a correlation heatmap, as depicted in Figure .

Figure 6. Correlation heatmap of meteorological parameters with PM_2.5.

From the results, it has been inferred that all the meteorological parameters used as input have a negative correlation with PM_2.5. With a Pearson correlation coefficient (r) of −0.62, wind direction (WR) has the largest negative correlation. Additionally, wind speed (WS) displays a negative correlation with an r value of −0.24. The negative correlation of PM_2.5 concentration with wind speed and direction is based on the fact that wind is capable of transporting the light PM_2.5 particles in the air. A higher wind speed or a change in the wind direction away from the monitoring station would transport the PM_2.5 particles away, thereby reducing their concentration (Raju et al., Citation2022).

The r value for temperature (Temp) was obtained as −0.22. The negative correlation of temperature with PM_2.5 concentration is due to the strong air convection during higher temperatures. As the atmospheric temperature increases, the land heats up more quickly than the air. This creates a disparity in temperature between the air near the land surface and the air above it. The warmer air near the surface becomes less dense and starts to rise through convection. The lighter PM_2.5 particles at the surface are transported upward with the ascending air during this intense convective upward movement. As a result, their concentration is reduced near the ground, contributing to a decrease in the overall level of PM_2.5 pollutants in the lower atmosphere.

The r values for RH and SR are −0.02 and −0.075, respectively, showing that both of these parameters have a modest connection with PM_2.5.

Table shows the details about the pollutant parameters being used in the study.

Table 5. Details about the pollutant input variables

Download CSV Display Table

To investigate the relationship between pollutant parameters and PM_2.5 levels, a correlation analysis was done, and the results were visualised in the form of a correlation heatmap (Figure ). All pollutant indicators were shown to have positive relationships with PM_2.5. With a r value of 0.54, NO₂ had the strongest positive connection with PM_2.5 of any pollutant measure. This suggests that NO₂ is a key precursor of PM_2.5. With a r value of 0.32, ozone exhibited the second largest positive connection with PM_2.5. SO₂ has a mildly positive connection with PM_2.5, as demonstrated by its r value of 0.22. CO, on the other hand, has the lowest r value, indicating that it has no effect on PM_2.5 concentrations. The correlation analysis revealed important information about how different input parameters are related to the dependent variable, PM_2.5. Despite the fact that some parameters had little effect on the objective variable, all parameters were utilised as inputs in the modelling process to assure full consideration without adding computing complexity.

Figure 7. Pollutant parameter correlation heatmap with PM_2.5.

4.4. PM_2.5 modeling

The PM_2.5 modeling has been done using MLP, KNN and HGBoost. The data was split (80% for training and 20% for testing) and fed into three different models for each algorithm, each with a different set of input variables. The first model used only the meteorological parameters as independent variables, while the second model used pollutant variables alone. As input for PM_2.5 modelling, the third model used a combination of meteorological and pollutant data.

4.4.1. Multi Linear Regression (MLR)

Table displays the results of the multilinear regression model. The results show that the MLR1 model with only pollutant data as input had the least impressive results, with an R² of 0.345 during testing. This model’s error was likewise higher than the other two models, with an RMSE of 18.06 μg/m³ and MAE of 14.552 μg/m³. During testing, the MLR2 model utilising meteorological characteristics as input performed better than the MLR1 model, with a lower error and an R² value of 0.467.

Table 6. Results obtained for multi linear regression

Download CSV Display Table

However, the MLR3 model with both pollutant and meteorological data as input outperformed the other two models examined. It had the smallest error of the three models, with a MAE of 11.297 μg/m³ and an RMSE of 14.453 μg/m³. During training and testing, R² values were 0.577 and 0.581, respectively. The R² for testing in the MLR3 model increased by 68.4% and 24.4%, respectively, as compared to the MLR1 and MLR2 models.

The outcomes of the MLR3 model are depicted in Figure . To compare the anticipated and test values, a combined plot was created. The R² score for testing was determined to be 0.581, indicating that the model was well-fitting. Furthermore, r between the predicted and test values was calculated to be 0.76, indicating a strong positive association. The distribution plot with a kernel density estimator (KDE) in Figure depicts the distribution of errors or residuals in the modelling results. The distribution plot’s base width is broader, indicating that the mistakes are more variable. The observed peak for zero error is quite near to 0.03, showing a minor bias in the model’s predictions (Figure ). The equation of the line of best fit for MLR3 is given by:

Figure 8. Distribution plot for MLR3 model.

Figure 9. Joint plot for MLR3 model.

\begin{aligned} P M_{2.5} = 72.975 + 0.328 N O_{2} + 0.279 S O_{2} + 0.090 CO \\ + 0.281 Ozone - 0.147 RH - 0.144 Temp - 0.103 WS \\ - 0.154 WD - 0.070 SR \end{aligned}

4.4.2. K-nearest neighbour regression model (KNN model)

Table displays the results of the KNN regression model. The K value used for the modelling is (K) = 2. The results obtained indicate that the KNN1 model with pollutant data alone as the input gave the most underwhelming result with a R² of 0.471 during testing. The error obtained for this model was also higher than the other two models, with an RMSE of 16.229 μg/m³ as well as a MAE of 11.717 μg/m³. During testing, the KNN2 model with meteorological parameters as input outperformed KNN1, with a lower error and an R² value of 0.498.

Table 7. Results obtained for K- nearest neighbour regression

Download CSV Display Table

Among the three models evaluated, the KNN3 model, which utilized both pollutant and meteorological data as input, demonstrated superior performance. With a MAE (Mean Absolute Error) of 9.137 μg/m³ and an RMSE (Root Mean Square Error) of 12.594 μg/m³, it exhibited the lowest error compared to the other models. The KNN3 model achieved R² values of 0.897 and 0.682 during the training and testing periods, respectively. Notably, the R² value for testing in the KNN3 model exhibited a significant improvement, increasing by 44.8% compared to the KNN1 model and 36.9% compared to the KNN2 model. These results underscore the KNN3 model’s superior ability to accurately predict PM_2.5 levels and outperform the other models in the evaluation.

Figure shows the results obtained for the KNN3 model. A joint plot was plotted for the predicted value versus the test value. The R² score for testing was 0.682, and the correlation coefficient (r) between predicted and test values was 0.83.

Figure 10. Distribution plot for KNN3 model.

In Figure , the distribution plot with a KDE function is presented, representing the distribution of errors or residuals in the modeling results. The base width of the distribution plot appears wider, suggesting a significant variance in the error values. However, despite the wider base width, the peak for zero error was observed at 0.05, which was higher than that of KNN3. This higher peak indicates that the model performed better than the KNN2 model (Figure ). Overall, these findings suggest that the model’s predictions were more accurate and closer to the actual values compared to the KNN2 model.

Figure 11. Joint plot for KNN3 model.

4.4.3. Histogram based gradient boost (HGBoost) model

Histograms conveniently illustrates the distribution of data, more precisely the number of occurrences of a data point, in case the data is repetitive. When the data fed to a model is arranged or discretised into bins as in a histogram, the flexibility of model increases. Combining histogram-based algorithm with gradient boosting constructs high performance machine learning ensembles (Nhat-Duc & Van Duc, Citation2023). HGBoost makes the algorithm to catch hold on integer-based data structures (histograms) instead of relying on sorted continuous values when building the trees. HGBoost is well-suited for capturing complex nonlinear relationships in the data. It combines gradient boosting algorithms with histogram-based techniques, allowing it to model and exploit interactions among features effectively.

Table displays the findings generated from the HGBoost model. The results show that the HGBoost2 model using only pollutant data as input had the least impressive results, with an R² value of 0.667 during testing. This model’s error was likewise higher than the other two models, with an RMSE of 11.748 μg/m³ and a MAE of 8.807 μg/m³. During testing, the HGBoost1 model utilising meteorological parameters as input outperformed HGBoost2 with a lower error and an R² of 0.728.

Table 8. Results obtained for HGBoost

Download CSV Display Table

However, the HGBoost3 model with both pollution and meteorological data as input performed the best of the three models examined. It had the lowest error of the three models, with a MAE of 5.717 μg/m³ and an RMSE of 7.647 μg/m³. During training and testing, the R² values were 0.981 and 0.859, respectively. In comparison to the HGBoost1 and HGBoost1 models, the R² for testing in the HGBoost3 model increased by 20% and 28.78%, respectively.

Figure illustrates the outcomes of the HGBoost3 model. A joint plot was generated to visualize the predicted values versus the test values. The model achieved an impressive R² value of 0.859 during testing, demonstrating its strong predictive performance. Furthermore, r between predicted and test values was estimated to be 0.93, showing a strong positive association. Figure depicts the distribution of mistakes or residuals in the modelling results using a distribution plot with a KDE. The base width of the distribution plot is broader, indicating that the mistakes are variable. Notably, the peak detected for zero error is centred at 0.04, indicating that the model is accurate and has a low bias in its predictions (Figure ).

Figure 12. Distribution plot for HGBoost3 model.

Figure 13. Joint plot for HGBoost3 model.

4.4.4. Comparison of model performance

From Figures , we can infer that the KNN3 model performed better than the MLR3 and KNN2 models. The MAE and RMSE of 9.137 μg/m³ and 12.594 μg/m³ were found to be less than the corresponding MAE and RMSE values of the MLR3 and KNN2 models. But the same does not hold when the comparison is made with respect to the HGBoost3 model. In the overall picture, we can conclude that the HGBoost3 model performed better than all the other models, including KNN3, with the lowest MAE and RMSE of 5.717 μg/m³ and 7.647 μg/m³, respectively. The MSE value 58.487 (μg/m³)² of HGBoost3 was also found to be the least among all the other models. Hence, we can conclude that HGBoost3 model gave the least error for PM_2.5 modelling.

Figure 14. MAE and RMSE for different models.

Figure 15. MSE for different models.

The comparison of R² values during training as well as testing, as depicted in Figure , provides a clear assessment of the models’ performance. From the figure, it is evident that the KNN3 model outperformed both the MLR3 and KNN2 models. The R² for the KNN3 model during testing was 0.682, which surpassed the R² values of MLR3 (0.581) and KNN2 (0.498). However, when compared to all the models, the HGBoost3 model demonstrated superior performance in both training as well as testing stages, achieving R² of 0.981 and 0.859, respectively.

Figure 16. R² for different models.

The HGBoost3 model was found to be the most successful for PM_2.5 modelling based on minimal error and maximum R² values. The spatiotemporal air quality evaluations were carried out over a two-year period, from January 2018 to December 2019, and full results were presented. The correlation analysis clarified the connections between the input parameters and the output variable. PM_2.5 modelling was performed using multilinear regression, K-nearest neighbour regression, and HGBoost, and the results were compared based on error and R² values. Finally, the best model was chosen based on its high R² value and low error. HGBoost typically has faster training times compared to MLR and KNN, especially when dealing with large datasets. MLP requires iterative optimization techniques like backpropagation, which can be computationally intensive and time-consuming, while HGBoost’s histogram-based approach allows for efficient parallelization.

5. Summary and conclusions

For the city of Hyderabad, a spatiotemporal air quality investigation was performed from January 2018 to December 2019. Multilinear regression, KNN, and HGBoost regression models were used to model PM_2.5. The findings show that for PM_2.5 modelling, HGBoost regression outperforms.

Spatiotemporal air quality study revealed higher levels of PM_2.5 and NO₂ at several monitoring stations.
The highest measured PM_2.5 concentration was 130.54 μg/m³, which exceeded the CPCB air quality regulation of 60 μg/m³.
Similarly, the highest NO₂ level was 130.59 μg/m³, which exceeded the CPCB guideline of 80 μg/m³.
There were seasonal changes in PM_2.5 concentrations, with average winter levels 68% greater than summer levels.
Positive correlations were found between PM_2.5 and various pollutants, with NO₂ displaying the highest correlation (Pearson coefficient of 0.54).
Conversely, when meteorological parameters were considered, negative correlations were observed. Speed and wind direction showed the highest negative correlations (−0.62 and −0.24, respectively).
Meteorological factors had a greater influence on PM_2.5 modelling than pollutant data.
Using both meteorological and pollution data, the HGBoost3 model performed remarkably well, with R² values of 0.981 and 0.859 during training and testing, respectively.
The HGBoost3 model has the lowest errors, with MAE and RMSE of 5.717 and 7.647 μg/m³, respectively.
As a result, the HGBoost3 model was chosen as the study’s best PM_2.5 forecasting model.

The research emphasizes the urgency of addressing air pollution in Hyderabad, given its adverse impacts on public health, the environment, and overall quality of life. By employing machine learning techniques and adopting a multi-faceted approach, there is hope for positive change, leading to cleaner air and a healthier future for the residents of Hyderabad and other polluted cities worldwide. The findings indicate that air quality in the city is influenced by a combination of local emissions, meteorological conditions, and regional factors. Machine learning models have proven to be effective tools in predicting PM_2.5 levels, allowing for better understanding and forecasting of air pollution episodes. The significance of this study lies in its potential to assist policymakers and environmental authorities in implementing targeted measures to tackle air pollution effectively. By identifying key contributors to PM_2.5 concentrations, authorities can design more focused and impactful policies, such as stricter emission controls, improved urban planning, and public awareness campaigns. However, it is essential to acknowledge that the effectiveness of these measures depends on collaborations between different stakeholders, including government bodies, industries, and the general public. Continuous monitoring, data collection, and model refinement will be crucial to maintain the accuracy and reliability of the predictions.

Correction

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Author contributions statement

Conceptualization and supervision: AM writing review and editing: AM, GPR, PRS, AKS and HGA; data curation and formal analysis: GPR, AM, AKS and PRS; evidence collection, review, and editing: GPR, AM, PRS, HGA, HA and AAA.

Availability of data and material

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors express their gratitude to the editor and anonymous reviewers for their valuable and insightful feedback, which significantly enhanced the quality of this paper. Furthermore, the authors extend their thanks to the CPCB (Central Pollution Control Board) for providing the air pollution data via its online CAAQMS website, enabling the research and analysis presented in this study. We are also thankful to the editors and potential reviewers.

Disclosure statement

On behalf of all authors, the corresponding author states that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

Analitis, A., Barratt, B., Green, D., Beddows, A., Samoli, E., Schwartz, J., & Katsouyanni, K. (2020). Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. Atmospheric Environment, 240(June), 117757. https://doi.org/10.1016/j.atmosenv.2020.117757
Google Scholar
Athanasiadis, I. N., Kaburlasos, V. G., Mitkas, P. A., & Petridis, V. (2003) Applying machine learning techniques on air quality data for real-time decision support. In: First international NAISO symposium on Information Technologies in Environmental Engineering (ITEE’2003), Gdansk, Poland.
Google Scholar
Baccarelli, A. A., Zheng, Y., Zhang, X., Chang, D., Liu, L., Wolf, K. R., Zhang, Z., McCracken, J. P., Díaz, A., Bertazzi, P. A., Schwartz, J., Wang, S., Kang, C.-M., Koutrakis, P., & Hou, L. (2014). Air pollution exposure and lung function in highly exposed subjects in Beijing,China: A repeated-measure study. Particle and Fibre Toxicology, 11(1), 51. https://doi.org/10.1186/s12989-014-0051-7
PubMedGoogle Scholar
Caselli, M., Trizio, L., de Gennaro, G., & Ielpo, P. (2009). A simple feedforward neural network for the pm 10 forecasting: Comparison with a radial basis function network and a multivariate linear regression model. Water, Air, and Soil Pollution, 201(1–4), 365. https://doi.org/10.1007/s11270-008-9950-2
Web of Science ®Google Scholar
Chaloulakou, A., Kassomenos, P., Spyrellis, N., Demokritou, P., & Koutrakis, P. (2003). Measurements of pm10 and PM2.5 particle concentrations in Athens, Greece. Atmospheric Environment, 37(5), 649. https://doi.org/10.1016/S1352-2310(02)00898-1
Web of Science ®Google Scholar
Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K., Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A., Kan, H., Knibbs, L., Liu, Y., Martin, R. & Murray, C. J. L. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases study 2015. Lancet, 389(10082), 1907–23. https://doi.org/10.1016/S0140-6736(17)30505-6
PubMed Web of Science ®Google Scholar
Das, K., Chatterjee, N. D., Jana, D., & Bhattacharya, R. J. (2023). Application of land-use regression model with regularization algorithm to assess PM2.5 and PM10 concentration and health risk in Kolkata Metropolitan. Urban Climate, 49, 101473. https://doi.org/10.1016/j.uclim.2023.101473
Web of Science ®Google Scholar
De Vito, L., Chatterton, T., Namdeo, A., Nagendra, S., Gulia, S., Goyal, S., Bell, M., Goodman, P., Longhurst, J., Hayes, E., Kumar, R., Sethi, V., Sengupta, B., Ramadurai, G., Majmuder, S., Menon, J. S., Turamari, M. N., & Barnes, J. (2018). Air pollution in Delhi: A review of past and current policy approaches. WIT Transactions on Ecology and the Environment, 230, 441–451. https://doi.org/10.2495/AIR180411
Google Scholar
Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., & Wang, J. (2015). Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, 107, 118–128. https://doi.org/10.1016/j.atmosenv.2015.02.030
Web of Science ®Google Scholar
Goudarzi, G., Hopke, P. K., & Yazdani, M. (2021a). Forecasting PM2.5 concentration using artificial neural network and its health effects in Ahvaz. Iran, Chemosphere, 283, 131285. https://doi.org/10.1016/j.chemosphere.2021.131285
PubMed Web of Science ®Google Scholar
Goudarzi, G., Hopke, P. K., & Yazdani, M. (2021b). Forecasting PM2.5 concentration using artificial neural network and its health effects in Ahvaz, Iran. Chemosphere, 283(June), 131285. https://doi.org/10.1016/j.chemosphere.2021.131285
PubMedGoogle Scholar
Gregorio, J., Gouveia-Caridade, C., & Caridade, P. J. S. B. (2022). Modeling PM2.5 and PM10 using a Robust simplified linear regression machine learning algorithm. Atmosphere, 13(8), 1334. https://doi.org/10.3390/atmos13081334
Web of Science ®Google Scholar
Grell, G., Peckham, S., Schmitz, R., McKeen, S., Frost, G., Skamarock, W., & Eder, B. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975. https://doi.org/10.1016/j.atmosenv.2005.04.027
Web of Science ®Google Scholar
Gronlund, C. J., Humbert, S., Shaked, S., O’Neill, M. S., & Jolliet, O. (2015). Characterizing the burden of disease of particulate matter for life cycle impact assessment. Air Quality, Atmosphere, & Health, 8(1), 29e46. https://doi.org/10.1007/s11869-014-0283-6
Web of Science ®Google Scholar
Gulia, S., Nagendra, S. M. S., Barnes, J., & Khare, M. (2018). Urban local air quality management framework for non-attainment areas in Indian cities. Science of the Total Environment–1318.
Google Scholar
Huang, Y., Wu, Y., & Zhang, W. (2020). Comprehensive identification and isolation policies have effectively suppressed the spread of COVID-19. Chaos, Solitons& Fractals, 139, 110041. https://doi.org/10.1016/j.chaos.2020.110041
PubMed Web of Science ®Google Scholar
IQAir, 2020. World air quality report. 2020 World Air Qual. Rep. 1–35.
Google Scholar
Jung, C.-R., Hwang, B.-F., & Chen, W.-T. (2015). Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2. 5 concentrations in Taiwan from 2005 to 2015. Environmental Pollution, 237(1), 1000–1010. https://doi.org/10.1016/j.envpol.2017.11.016
Google Scholar
Kapoor, M. (2017). Managing ambient air quality using ornamental plants-an alternative approach. Universal Journal of Plant Science, 5(1), 1–9. https://doi.org/10.13189/ujps.2017.050101
Google Scholar
Kim, E., Park, H., Hong, Y. C., Ha, M., Kim, Y., Kim, B. N., Kim, Y., Roh, Y. M., Lee, B. E., Ryu, J. M., Kim, B. M., & Ha, E. H. (2014). Prenatal exposure to PM10 and NO2 and children’s neurodevelopment from birth to 2 months of age: Mothers and Children’s Environmental Health (MOCEH) study. Science of the Total Environment, 481, 439–445. https://doi.org/10.1016/j.scitotenv.2014.01.107
PubMed Web of Science ®Google Scholar
Kumar, A., & Gurjar, B. R. (2019). Low-cost sensors for air quality monitoring in developing countries – A critical view. Asian Journal of Water, Environment & Pollution, 16(2), 65–70. https://doi.org/10.3233/AJW190021
Web of Science ®Google Scholar
Kumar, S., Mishra, S., & Singh, S. K. (2020). A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon, 6(11), e05618. https://doi.org/10.1016/j.heliyon.2020.e05618
PubMed Web of Science ®Google Scholar
Lancet. (2020). Health and economic impact of air pollution in the states of India: The global burden of disease study 2019. The Lancet Planetary Health, 5(1), e25–e38. https://doi.org/10.1016/S2542-5196(20)30298-9
PubMedGoogle Scholar
Le, V., Bui, C. T., & Cha, S. (2020). Spatiotemporal deep learning model for citywide air pollution interpolation and prediction. Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, 55–62. IEEE. https://doi.org/10.1109/BigComp48618.2020.00-99
Google Scholar
Li, X., Peng, L., Hu, Y., Shao, J., & Chi, T. (2016). Deep learning architecture for air quality predictions. Environmental Science and Pollution Research, 23(22), 408–422. https://doi.org/10.1007/s11356-016-7812-9
PubMed Web of Science ®Google Scholar
Li, X., Song, H., Zhai, S., Lu, S., Kong, Y., Xia, H., & Zhao, H. (2019). Particulate matter pollution in Chinese cities: Areal-temporal variations and their relationships with meteorological conditions (2015–2017). Environmental Pollution, 246, 11–18. https://doi.org/10.1016/j.envpol.2018.11.103
PubMed Web of Science ®Google Scholar
Liu, H., Yan, G., Duan, Z., & Chen, C. (2021). Intelligent modeling strategies for forecasting air quality time series: A review. Applied Soft Computing, 102, 106957. https://doi.org/10.1016/j.asoc.2020.106957
Web of Science ®Google Scholar
Ma, X., Chen, T., Ge, R., Cui, C., Xu, F., & Lv, Q. (2022). Time series-based PM2.5 concentration prediction in Jing-Jin-Ji area using machine learning algorithm models. Heliyon, 8(9), e10691. https://doi.org/10.1016/j.heliyon.2022.e10691
PubMed Web of Science ®Google Scholar
Nhat-Duc, H., & Van Duc, T. (2023). Comparison of histogram-based gradient boosting classification machine, random Forest, and deep convolutional neural network for pavement raveling severity classification. Automation in Construction, 148, 10476. https://doi.org/10.1016/j.autcon.2023.104767
Web of Science ®Google Scholar
Ni, X. Y., Huang, H., & Du, W. P. (2017). Relevance analysis and short-term prediction of PM2. 5 concentrations in Beijing based on multisource data. Atmospheric Environment, 150, 146–161. https://doi.org/10.1016/j.atmosenv.2016.11.054
Web of Science ®Google Scholar
Niska, H., Hiltunen, T., Karppinen, A., Ruuskanen, J., & Kolehmainen, M. (2004). Evolving the neural network model for forecasting air pollution time series. Engineering Applications of Artificial Intelligence, 17(2), 159. https://doi.org/10.1016/j.engappai.2004.02.002
Web of Science ®Google Scholar
Raju, L., Gandhimathi, R., Mathew, A., & Ramesh, S. T. (2022). Spatio-temporal modelling of particulate matter concentrations using satellite derived aerosol optical depth over coastal region of Chennai in India. Ecological Informatics, 69, 101681. https://doi.org/10.1016/j.ecoinf.2022.101681
Web of Science ®Google Scholar
Roychowdhury, A., & Somvanshi, A. (2020). Breathing space; How to track and report air pollution under the national clean air programme, center for science and environment.
Google Scholar
Siew, Y. L., Ying Chin, L., Mah, P., & Wee, J. (2008). ARIMA and integrated ARFIMA models for forecasting air pollution index in shah alam. The Malaysian Journal of Analytical Sciences, 12.
Google Scholar
Silva, R., West, J., Zhang, Y., Anenberg, S., Lamarque, J.-F. T., Shindell, D., Collins, W., Dalsøren, S., Faluvegi, G., Folberth, G., Horowitz, L., Nagashima, T., Naik, V., Rumbold, S., Skeie, R., Sudo, K., Takemura, T., Bergmann, D. & Szopa, S. (2013). Global premature mortality due to anthropogenic outdoor air pollution and the contribution of past climate change. Environmental Research Letters, 8(3), 2. https://doi.org/10.1088/1748-9326/8/3/034005
Web of Science ®Google Scholar
Singh, P., Gupta, K., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023
Web of Science ®Google Scholar
Tsai, Y. T., Zeng, Y. R., & Chang, Y. S. (2018). Air pollution forecasting using rnn with lstm. Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), Athens, Greece (pp. 1074–1079). https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00178.
Google Scholar
Varaprasad, V., Kanawade, V. P., & Narayana, A. C. (2021). Spatio-temporal variability of near-surface air pollutants at four distinct geographical locations in Andhra Pradesh State of India. Environmental Pollution, 268, 115899. https://doi.org/10.1016/j.envpol.2020.115899
PubMed Web of Science ®Google Scholar
Wang, Z., Wei, W., & Zheng, F. (2020). Effects of industrial air pollution on the technical efficiency of agricultural production: Evidence from china. Environmental Impact Assessment Review, 83, 106407. https://doi.org/10.1016/j.eiar.2020.106407
Web of Science ®Google Scholar
WHO. (2021). WHO “Global Air Quality Guidelines”. Organization. https://apps.who.int/iris/handle/10665/345329
Google Scholar
World Health Organization. (2014). World Health Statistics. World Health Organization. Retrieved September 20, 2021, from https://apps.who.int/iris/bitstream/handle/10665/112738/9789240692671_eng.pdf;jsessionid=F52821DBF1D54D49FB35C09515956A25?sequence=1.
Google Scholar
Zhang, P., Ma, W., Wen, F., Liu, L., Yang, L., Song, J., Wang, N., & Liu, Q. (2021). Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China. Ecotoxicology and Environmental Safety, 225, 112772. https://doi.org/10.1016/j.ecoenv.2021.112772
PubMed Web of Science ®Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Air quality analysis and PM_2.5 modelling using machine learning techniques: A study of Hyderabad city in India

Abstract

1. Introduction