926
Views
2
CrossRef citations to date
0
Altmetric
Research Articles

The effect of covariates on Soil Organic Matter and pH variability: a digital soil mapping approach using random forest model

, , , , , , & show all
Pages 215-232 | Received 12 Nov 2022, Accepted 19 Jan 2024, Published online: 29 Jan 2024

ABSTRACT

This research focuses on understanding the spatial variation of Soil Organic Matter (SOM) and pH levels in the North of Morocco. The study employs a comprehensive approach to enhance predictive modelling, incorporating the Boruta algorithm for effective environmental covariates selection and optimizing model parameters through hyperparameter optimization. Utilizing a Random Forest (RF) model with remote sensing indices and topographic features, the research predicts SOM and pH to identify key contributors to their spatial variability. SOM prediction saw significant success, with a notable correlation to remote sensing indices such as the RVI, NDVI, and TNDVI. These indices, indicative of vegetation health and productivity, emerged as primary influencers of SOM. In comparison, the influence of topographic features like elevation, slope, and aspect was found to be less significant. Conversely, predicting pH was challenging due to the minimal spatial variability within the dataset. Addressing this limitation could involve dataset expansion or alternative models for low-correlated data handling. Despite the RF model’s limited efficacy in pH prediction, an observable correlation between SOM and pH was identified, consistent with prior research. Areas with higher SOM exhibited lower pH values, indicating relative soil acidification from organic matter decomposition. The study’s RF model demonstrated potential in SOM prediction using remote sensing indices, but enhancing pH prediction is essential. Future research may explore dataset expansion, diverse sampling, or testing alternative predictive models for better performance with low-correlated datasets. The study offers valuable insights for advanced predictive model development and enriches understanding of soil management practices.

1. Introduction

The importance of soil organic matter (SOM) content and pH as critical indicators of soil quality and health has been extensively acknowledged by various authors (Aboutayeb, El Yousfi, and El Gharras Citation2020; He et al. Citation2021; Wulanningtyas et al. Citation2021). These properties significantly influence mineral availability, soil fertility, and crop yield (Laurent et al. Citation2020). The increase in SOM is linked to carbon sequestration, leading to enhanced physical, chemical, and biological properties, and improved nutrient bioavailability (Ashworth et al. Citation2017; Falahatkar et al. Citation2014; Garosi et al. Citation2022; Karchegani et al. Citation2012; Moussadek et al. Citation2014; Mrabet et al. Citation2012).

Soil pH and SOM content are crucial indicators of soil quality, with low levels of pH and SOM rendering the soil susceptible to degradation and various challenges like erosion, compaction, pollution, salinization, loss of biodiversity, and essential nutrients (Bünemann et al. Citation2018; Devkota et al. Citation2022; Doran et al. Citation2018; Zornoza et al. Citation2015). To safeguard soil quality, extensive soil measurements, laboratory analysis (Stone et al. Citation2016), and digital soil mapping techniques are utilized to estimate spatial variability in these properties (Asgari et al. Citation2020; Tajik, Ayoubi, and Zeraatpisheh Citation2020; Zeraatpisheh et al. Citation2017, Citation2019).

The crucial role of soil pH and SOM in supporting biological, physical, and chemical activities within the soil necessitates comprehensive monitoring on a large scale. However, spatial variations in these properties across diverse soil areas pose a challenge, as highlighted by various studies (Robertson, Crum, and Ellis Citation1993; Yu et al. Citation2017). Addressing these variations intensifies the monitoring process, requiring substantial human, organizational, and financial resources for accurate representation of real-world conditions, as emphasized by Bouslihim et al. (Citation2021b) and Gholizadeh et al. (Citation2015).

Efficiently understanding and preventing soil degradation necessitates a thorough study and mapping of these properties. Digital soil mapping (DSM) emerges as a pivotal approach in this regard, enabling precise predictions of soil SOM and pH beyond the capabilities of conventional soil maps. DSM not only offers a cost-effective solution but also provides insights into map accuracy, empowering informed decision-making (Robertson, Crum, and Ellis Citation1993; Yu et al. Citation2017).

Traditional techniques like kriging and inverse distance weighting (IDW) have been widely employed for developing soil parameter maps at varying scales (Bouasria et al. Citation2021; Tibhirine et al. Citation2023; Zhao et al. Citation2019). It’s essential to clarify that while techniques like ordinary kriging rely on spatial relationships among known points, advanced geostatistical approaches like regression kriging incorporate multiple environmental covariates impacting soil formation and distribution (L. Chen et al. Citation2019).

Machine learning (ML) applications in DSM have significantly expanded opportunities (Bouasria et al. Citation2023; S. Chen et al. Citation2022). Extensive research underscores its growing importance at continental and national levels, supported by the availability of free and regular datasets, including satellite images at various resolutions, climate data, and topography (Bouasria et al. Citation2022; Bouslihim et al. Citation2021; Hengl et al. Citation2017, Citation2021; John, Bouslihim, Bouasria, et al. Citation2022; John, Bouslihim, Ofem, et al. Citation2022; Shi et al. Citation2018).

Environmental covariates form fundamental inputs for DSM based on the soil – environment relationship. These include derivatives of digital elevation models (DEM) and remotely sensed spectral data (RS). DEM-derived data represent relief or topographical units, while RS imagery offers valuable information about soil properties through empirical methods (Forkuor et al. Citation2017). Assessing the influence of environmental covariates on SOM and pH is pivotal for predicting soil properties and understanding their relationships with environmental factors (Lamichhane, Kumar, and Wilson Citation2019; Samuel-Rosa et al. Citation2015). Machine learning presents an effective approach for mapping soil parameters based on these covariates.

This study distinguishes itself from previous research by employing a specific set of environmental covariates and remote sensing imagery, enhancing the interpretability of the modelling approach by focusing on relevant predictors. The goal is to provide a framework for leveraging machine learning techniques and high-resolution remote sensing data in predicting and mapping soil properties, thereby improving soil management and conservation practices. The primary objectives include utilizing spectral indices derived from multispectral satellite imagery and terrain attributes from high-resolution digital terrain models to model the spatial variability of SOM and pH using automated hyperparameter tuning of the random forest. Additionally, the study aims to analyse resulting maps to infer soil quality, understand the relationships between environmental covariates and soil properties, and assess their potential impact on land management practices in the Province of Taounate, North Morocco.

The research ultimate aim is to enhance soil management and conservation practices through the effective utilization of machine learning techniques and high-resolution remote sensing data for soil property prediction and mapping.

2. Methods and materials

2.1. Study area

The study area is located in the North of Morocco (Province of Taounate), in the hills north of the Atlas Mountains, 20 km south of Taounate, and 15 km north of Fez; it covers an area of 805 km2 between the following coordinates 34.134°/34.392°N and 4.547°/5.103°S (). The climate is the Mediterranean. Short periods of low temperatures can occur during the winter, and sometimes it can even snow. In summer, heat waves from the desert can occur. According to the Köppen-Geiger classification, the study area is in the CSA class (Hot-summer Mediterranean climate) with a mean annual temperature of 17.8°C and mean precipitation of 549 mm. The altitudes vary from 135 and 833 m. The area is characterized by a dominance of Forest soils and a combination of Mountain soil & Red-Brown soil. Geologically, the study area is characterized by predominantly sedimentary facies (conglomerate, marl, sandstone, clays, limestones, and dolomites) of Triassic, manifested by alternating marls and limestones and dolomites, which outcrop in the coverage of middle quaternary age.

Figure 1. The geographical location of the study area and soil samples.

Figure 1. The geographical location of the study area and soil samples.

2.2. Soil data and descriptive statistics

Grid sampling of 1 km was applied (). At each point, 1 kg of disturbed soil from the uppermost 30 cm was collected. In total, 191 samples were collected over two months (November and December 2013). The Walkley-Black method (FAO Citation2019) was used for organic carbon measurement, and the conversion factor of 1.724 was applied to estimate the percentage of organic matter. Soil pH was determined by measuring the hydrogen ion activity in an aqueous solution with a pH metre. In this study, various general statistical parameters were calculated, including minimum, maximum, mean, median, 1st quartile, and 3rd quartile. Additionally, the Sapiro-Wilk test was chosen to determine if the data follows a normal distribution. This test is a good choice for small data sets and is often used instead of the Shapiro-Francia test, which can be less reliable with small samples. The Sapiro-Wilk test has several benefits: it is simple to use and requires minimal data, it is generally more reliable than other tests for determining normality, and it is widely used and accepted in the research community and is often available in popular statistical software like R.

2.3. Environmental covariates

Several environmental variables influence the distribution of soil characteristics, such as remote sensing indices, topography, climate, and parent materials. However, this type of data is unavailable in many areas, complicating the application of DSM methods. For this reason, we used just two data types freely available worldwide in the present study. shows all the covariates used in this study. First, a Landsat-8 OLI_TIRS_L2SP image acquired on 3 November 2013 with 0% cloud cover for the area of interest and a spatial resolution of 30 m was used to extract twenty-one spectral indices as follows: Red, Green, Blue, Near-Infrared (NIR), Albedo, Brightness Index (IB), Chlorophyll Vegetation Index (CVI), Crop Moisture Index (CMI), Difference Vegetation Index (DVI), Ferrous Mineral Index (FMI), Green Vegetation Index (GVI), Iron Oxide Index (IOI), Log-Ratio Vegetation Index (LRVI), Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI1 & NDWI2), Ratio Vegetation Index (RVI), Soil Adjusted Vegetation Index (SAVI), Soil Color Index (ICS), Soil Redness Index (SRI), Transformed Normalized Difference Vegetation Index (TNDVI). Also, eight terrain attributes such as elevation, slope, profile curvature (PrC), plan curvature (PlC), general curvature (GCurv), Multi-resolution Valley Bottom Flatness (MrVBF), Multi-resolution Ridge Top Flatness (MrRTF) and Topographic Wetness Index (TWI) were extracted from ALOS-PALSAR with a resolution of 12.5 m. Next, all 21 remote sensing indices were resampled to resolution of the topography attributes. Furthermore, the R/RStudio (R Core Team Citation2021) were using to raster data manipulation using the following packages (raster, terra, rgdal, sp, sf and PROJ).

Figure 2. Presentation of environmental covariates 21 remotes sensing indices & 8 terrain attributes.

Figure 2. Presentation of environmental covariates 21 remotes sensing indices & 8 terrain attributes.

CMI: Crop Moisture Index, CVI: Chlorophyll Vegetation Index, DVI: Difference Vegetation Index, Elevation (metre); FMI: Ferrous Mineral Index, GCuv: General curvature (Degrees), GVI: Green Vegetation Index, IB: Brightness Index, ICS: Soil Color Index, IOI: Iron Oxide Index, SRI: Soil Redness Index, LRVI: Log-Ratio Vegetation Index, MrRTF: Multi-resolution Ridge Top Flatness, MrVBF: Multi-resolution Valley Bottom Flatness, NDVI: Normalized Difference Vegetation Index, NDWI 1 & 2: Normalized Difference Water Index, NIR: Near-Infrared, PlC: plan curvature (Degrees), PrC: profile curvature (Degrees), RVI: Ratio Vegetation Index, SAVI: Soil Adjusted Vegetation Index, Slope (Degrees); TNDVI: Transformed Normalized Difference Vegetation Index and TWI: Topographic Wetness Index.

2.4. Environmental covariates selection

Before developing the RF model, environmental covariates selection process is highly recommended to minimize the number of variables and select the most relevant ones. This technique reduces the information redundancy generated by certain variables, which would introduce unnecessary noise while improving and optimizing the modelling results. In the present study, the Boruta algorithm was used for environmental covariates selection. This algorithm was performed in R statistical software using the Boruta package (Kursa and Rudnicki Citation2010). Boruta evaluates the significance of environmental covariates in relation to the target variable, helping to filter out irrelevant or noise-inducing environmental covariates. This step ensures that the selected environmental covariates are directly related to the outcome of interest, aligning with the goals of the study. The Boruta package’s fundamental approach relies on the Random Forest technique. This technique is pivotal in assessing the importance of each environmental covariate by comparing it with the importance of environmental covariates generated through random value shuffling. These ‘shadow’ covariates are created by randomly shuffling the values of the original environmental covariates. This process allows the Boruta package to effectively determine which covariates are truly important for the model. If environmental covariate’s importance is similar to or lower than that of the shuffled environmental covariates, it suggests that the environmental covariates may not provide meaningful information for prediction. Conversely, if the environmental covariate’s importance is significantly higher than that of the shuffled environmental covariates, it indicates its relevance in making accurate predictions. By iteratively applying this process and marking environmental covariates as ‘confirmed’, ‘rejected’, or ‘tentative’. Boruta ultimately helps in selecting a subset of environmental covariates that contribute the most to the model’s performance. This can lead to improved model accuracy, reduced overfitting, and a better understanding of which variables are truly influential in the dataset. Rejected and confirmed attributes and sometimes tentative attributes are determined based on three scores (Z-score) minimum, average, and a maximum of a shadow attribute. Attributes above the maximum Z-score are considered confirmed attributes and can be used to develop the RF model. Furthermore, a shadow attribute refers to a synthetic environmental covariate that is created by permuting or shuffling the values of a real environmental covariate within the dataset. These shadow attributes are used for comparison purposes during the environmental covariate’s selection process. In the Boruta algorithm, the ‘TentativeRoughFix’ function is instrumental in determining the importance of environmental covariates and iteratively refining the selection of important environmental covariates based on their significance compared to the shadow attributes. It helps in identifying the subset of environmental covariates that provide meaningful predictive power for the model while excluding irrelevant or redundant environmental covariates.

2.5. Random forest model and environmental covariates importance

Random Forest (RF) is a non-parametric technique that consists of many individual tree models trained from bootstrap data samples (Breiman, Citation2001). The output of each respective tree is aggregated to make a single prediction. This technique can also rank the predictor variable’s relative importance based on the regression prediction error of out-of-bag (OOB) predictions. RF perturbs each variable and calculates the importance as the resulting change in the OOB error (Pouladi et al. Citation2019; Shi et al. Citation2018; Taghizadeh-Mehrjardi, Nabiollahi, and Kerry Citation2016). Two important parameters in RF development are the number of trees (ntree) and the number of variables available for selection in each split (mtry) (Houborg and McCabe Citation2018).

The Random Forest (RF) algorithm inherently provides an estimation of environmental covariates importance by assessing the mean decrease in node impurity, quantified through metrics such as Gini impurity or mean squared error (MSE), resulting from environmental covariates-based tree splits within the forest. In essence, environmental covariates that significantly reduce variance are considered more pivotal. The environmental covariates importance mechanism within RF assigns a numerical value to signify the extent of each environmental covariate’s impact on enhancing the model’s predictive accuracy. By leveraging this built-in capability, RF automatically ranks environmental covariates by their importance, offering insights into the key determinants behind modelled outcomes. In accordance with this approach, we analysed the environmental covariates’ importance outcomes generated by our RF model to understand the principal environmental variables shaping the spatial patterns of SOM and pH.

2.6. Hyperparameter optimization, model performances and uncertainty

Hyperparameter optimization involves searching for the best parameter values that govern the behaviour of a machine learning algorithm. In the present study, a combined grid search was applied to test several combinations between mtry 1 to 20 and 1 to 14 for SOM and pH, respectively. For ntree, five values were tested (500, 1000, 1500, 2000), using three repeats of 5-fold cross-validation (CV). The best combination was chosen based on the repeated CV’s root mean square error (RMSE) results. The optimal values of ntree and mtry were applied to develop our SOM and pH prediction models. The performance of both models was assessed using a 5-fold cross-validation approach. This approach divides the initial database (191 samples) into several subsets (5 in this case); each time, four subsets are used to train the model, and the fifth subset is used to validate the model. The mean result of the five tests was used to evaluate the final performance of the model. In this sense, three statistical indices, including coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE), were used to evaluate the accuracy of the RF model for the prediction of SOM and pH, and its equations are as follows:

(1) R2=1iOiPi2iOiOˉ2(1)
(2) RMSE=PIOi2n(2)
(3) MAE=1nOiPi(3)

The prediction interval coverage probability (PICP) 90% is a metric that quantifies the accuracy of the uncertainty assessment in the model’s predictions. This metric is calculated as the percentage of cross-validation observations that are contained within the 90% prediction interval. A PICP of close to 90% indicates that the uncertainty in the model’s predictions has been properly assessed. However, a PICP greater than 90% suggests that the uncertainty was underestimated, while a PICP less than 90% indicates that the uncertainty was overestimated (Poggio et al., Citation2021; Shrestha and Solomatine, Citation2006). To visualize the uncertainty as a map, the 90th prediction interval (PI90) is also calculated as the difference between the 95th and 5th quantiles of the predictions (Equation 4). A lower PI90 value indicates a more certain prediction, while a higher PI90 value signifies higher uncertainty in the model’s predictions.

(4) PI90=q0.95q0.05(4)

Moreover, the correlation map was performed between pH and SOM raster grids using the ‘corLocal’ function within the ‘raster’ R package. The local correlation coefficient for the two rasters was computed using a focal neighbourhood window of 5 × 5pixels and the Pearson method.

3. Results

3.1. Descriptive statistics

shows the summary statistics for the 191 samples. shows the values for SOM range from 0.505% to 5.094%, with a first quartile of 1.787, a median of 2.361%, a mean of 2.649%, a third quartile of 3.631, and a maximum of 5.094%. Similarly, the values for pH range from 6.840 to 7.840, with a first quartile of 7.290, a median of 7.400, a mean of 7.393, a third quartile of 7.490, and a maximum of 7.840. The average SOM value found in the study area is higher than the 1.34% value reported by Bouasria et al. (Citation2021) at Sidi Bennour, located in the Doukkala plain (western Morocco). On the other hand, Bouslihim et al. (Citation2021a) reported a higher mean value of organic matter (3.76%) in the Settat-Ben Ahmed plain (Central Morocco).

Table 1. Summary statistics for the 191 samples.

In this research, it is possible to use a normality test to check if the data follows a normal distribution and then use the Random Forest for prediction. In this situation, the normality test would be used first to evaluate the data distribution. However, it would not have a direct connection to the use of the Random Forest algorithm. depicts the distribution of SOM and pH data, with the histogram indicating that SOM is not normally distributed. A statistical test will be used to provide a more meaningful analysis of the sample.

Figure 3. Correlation matrix plot with a distribution of SOM (a) and pH data (b).

Figure 3. Correlation matrix plot with a distribution of SOM (a) and pH data (b).

The Shapiro-Wilk test found that there is not enough evidence to reject the hypothesis that the data are normally distributed, since the p-value of 0.4612 is larger than the alpha level of 0.05. However, the SOM data has a p-value of 2.263 × 10−6, indicating that it is not normally distributed.

The distribution shows that 25% of pH values are strictly below 7.2. On the other hand, 75% of the organic matter values are higher than 1.9. The pH value ranged from near neutrality to slightly alkaline conditions. This value may be attributed to the predominant soil type and the region’s farming practices. The low level and high variability of SOM are mainly due to land use patterns, such as crops, grassland, and farming practices.

The normality of the values is shown in , where pH values show a normal distribution of data. However, the distribution is skewed to the left for the organic matter values. Furthermore, shows the matrix correlation between dependent and independent variables. A perfect correlation was identified between SOM, TNDVI, NDVI and SAVI. Other positive correlations were detected between Red & Albedo, NIR & Blue and IB & Albedo. On the other hand, MRVBF, MRRTF, PIC, PrC, and Slope have a low correlation (or even non-linearity) with almost all the other inputs.

3.2. Environmental covariates selection

shows the results of variable selection by Boruta’s algorithm. The blue box plots represent the Z score of the minimum (shadowMin), average (shadowMean), and maximum (shadowMax) of a shadow attribute. The red and green box plots represent the Z-scores of the rejected and confirmed attributes, respectively. For the SOM (), between 29 variables (21 spectral indices and 8 terrain attributes), 20 variables were considered relevant, and all between them are spectral indices except CMI. Besides, other terrain attributes have been rejected and considered irrelevant. For pH (), in the first step, 11 variables were considered relevant (Slope, MRVBF, MRRTF, elevation, LRVI, SAVI, TNDVI, NDVI, RVI, TWI and DVI), 14 variables were rejected. The remaining 4 variables (IOI, NDWI2, IB and GVI) are designated as tentative. The ‘TentativeRoughFix’ function was used to resolve this issue and the 4 variables were consequently rejected. Compared to the SOM, the terrain attributes (Slope, MRVBF, MRRTF and elevation) were ranked as the most relevant, while the spectral indices (RVI, NDVI and TNDVI) for the SOM were found to be the most relevant.

Figure 4. Result of environmental covariates selection by Boruta’s algorithm for a) SOM and b) pH prediction.

Figure 4. Result of environmental covariates selection by Boruta’s algorithm for a) SOM and b) pH prediction.

3.3. Environmental covariates importance

Here we showed the importance plots as a form of a ‘post hoc analysis’ as it offers insights into the contribution of each Environmental covariates to the RF performance. This understanding is valuable for interpretation of the machine learning models and presents to stakeholders about the most critical variables in the model for predicting SOM or pH. The RF’s importance measures of the predictor variables revealed that remote sensing indices have the most significant contribution in explaining the spatial distribution of SOM (), more precisely, the RVI, NDVI, and TNDVI, which directly relate to vegetation cover.

Figure 5. Environmental covariates importance for a) SOM prediction and b) pH prediction.

Figure 5. Environmental covariates importance for a) SOM prediction and b) pH prediction.

Healthy vegetation generally indicates a higher potential for organic matter input to the soil, thus areas with high vegetation indices, such as RVI, NDVI, and TNDVI, might be expected to have higher SOM content. While the indices LRVI, NDWI1, SAVI, NDWI2, and FMI were of less importance, they still significantly contributed to SOM prediction. NDWI1 and NDWI2, which are related to water content, could indicate waterlogged soils where decomposition rates might be slower, leading to higher organic matter content.

On the other hand, as seen in , the variable importance in the pH prediction was significantly different. Contrary to SOM, most remote sensing indices’ contribution was low. Instead, topographic environmental covariates played a significant role, particularly the slope, followed by MRVBF and MRRTF.

Topographic features represent different aspects of the landscape’s physical form, which can significantly influence soil formation and properties. Steeper slopes might experience more erosion or faster water movement, reducing organic matter accumulation, and thus influencing SOM and pH. MRVBF (Multi-resolution Valley Bottom Flatness) and MRRTF (Multi-resolution Ridge Top Flatness) can influence water flow and accumulation patterns, further impacting SOM and pH distribution.

These conceptual relationships between variables highlight the importance of comprehensive environmental data in predicting soil properties and illustrate the value of machine learning in handling complex, non-linear relationships between variables for effective digital soil mapping.

3.4. Model performances and uncertainty

presents the tuning hyperparameter results for the RF model regarding the prediction of SOM and pH. From the same Figure, the best combination of ntree and mtry is provided, leading to the best performance of the model based on the RMSE results. For SOM, the results show that ntree = 500 and mtry = 3 yield the lowest RMSE value (0.366). For pH, the lowest RMSE value (0.15) was obtained by a value of ntree = 2000 and mtry = 2.

Figure 6. RF model tuning for a) SOM prediction and b) pH prediction.

Figure 6. RF model tuning for a) SOM prediction and b) pH prediction.

Generally, the default value of mtry for the RF regression model is p/3 (p represents the number of input variables) (Genuer and Poggi Citation2020). Based on this rule, the mtry for the present case must be 20/3 = 7 for SOM and 11/3 = 4 for pH. The optimum values of mtry obtained by the automated hyperparameter tuning applied in the present study are different from the default value of the RF model for both cases (3 for SOM and 2 for pH). Furthermore, a set of implications can be derived from . First, mtry equals 1 always gives the highest value of RMSE for SOM and pH. Also, increasing the number of variables (p) contributes directly to a decrease in RMSE until the optimum value is reached for either SOM or pH, and exceeding these values leads to a significant increase in the error. Generally, the default value proposed by the RF model (mtry equals 7 and 4 for SOM and pH, respectively) gives higher RMSE values than those obtained by the automated hyperparameter tuning (mtry equals 3 and 2 for SOM and pH, respectively). Also, varying the ntree values does not change the RMSE results significantly, and the results for each p remain close.

The optimal mtry and ntree for SOM and pH were applied with a five-fold cross-validation to develop the two models. and scatter plots () represent the performance of the RF model for predicting SOM and pH. The performance results show that the RF performed well in predicting SOM (R2 = 0.90, RMSE = 0.37%, MAE = 0.28%). The prediction output obtained here was higher than the reports of Žížala et al. (Citation2022), Wiesmeier et al. (Citation2011), and Zhang et al. (Citation2021), respectively, for R2. On the other hand, the automated hyperparameter in the RF model performed poorly in pH prediction, producing (R2 = 0.26, RMSE = 0.15, MAE = 0.11).

Figure 7. Scatter plots for observed vs. RF-model predicted values for a) SOM and b) pH.

Figure 7. Scatter plots for observed vs. RF-model predicted values for a) SOM and b) pH.

Table 2. RF model performances (5-fold cross-validation) for SOM and pH prediction.

In general, automated hyperparameter tuning can help improve the performance of RF models by identifying the optimal combination of hyperparameters that result in the best model performance, as assessed by evaluation metrics such as R2, RMSE, and MAE. However, it is essential to note that the performance of RF models can also be influenced by additional factors, including the complexity of the underlying relationships between predictor and response variables, among others. Given this, removing redundant variables before the model is calibrated through automated hyperparameter tuning may lead to a higher performance level for the RF model.

The 90th prediction interval (PI90) map for SOM () confirms the RF model’s prediction of SOM uncertainty. The map shows a small prediction uncertainty across the study area, with only a few pixels with high uncertainty. This suggests that the model’s predictions are accurate and consistent, and that the model has captured the underlying patterns in the data.

Figure 8. 90th prediction interval map for a) SOM and b) pH.

Figure 8. 90th prediction interval map for a) SOM and b) pH.

Conversely, the PI90 map of pH () is noisier, although a substantial portion of pixels still demonstrates low prediction uncertainty. There are a few areas with high uncertainty, especially in the area drained by the river. This suggests that there are several factors contributing to the variability in soil pH in this region, and the model’s inability to capture all requisite information for accurate predictions in these areas highlights the intricate nature of the pH dynamics.

Despite the poor performance of the RF model in predicting pH, the PI90 map shows that the uncertainty across large parts of the study area is small. This means that the model is confident in its predictions, even though they may not be accurate. This phenomenon is unlikely attributed to overfitting, considering the implementation of a robust 5-fold cross-validation approach that ensures comprehensive exposure to the dataset during training. Instead, it is more likely that the model is not capturing the underlying relationship between the input environmental covariates and the output variable (pH) or the insignificance of spatial pH variations.

3.5. Spatial prediction of SOM and pH

Generally, SOM content ranges between 2.2 to 2.6% in Chaouia Region (Aboutayeb, El Yousfi, and El Gharras Citation2020) and can reach a value of 6.6% in some specific areas (Bouslihim, Rochdi, and Paaza Citation2021). For the current study (Province of Taounate), soils with 3% SOM content could be explained by the presence of trees, which contribute to providing soil in terms of organic matter. This finding draws our attention to restitute SOM to the soil using organic amendments and stabilized biowaste like composts. This practice improves bio-physicochemical soil properties and thus enhances its fertility (El-Mrini et al. Citation2021), leading to increased crop yields.

The maps in illustrate the spatial distribution of SOM and pH. These maps were performed using the best model for each parameter and all input variables in a raster format with a resolution of 12.5 m.

Figure 9. Spatial prediction of a) SOM and b) pH.

Figure 9. Spatial prediction of a) SOM and b) pH.

The distribution of SOM values varies between 0.66 and 4.77% and does not follow a specific pattern; different ranges are mixed in each area (). According to Hounkpatin et al. (Citation2022) classification, these soils range between Class IV characterized by low fertility and severe limitations (SOM ranges between 0.5 to 1.0%). Class I considered highly fertile soils where SOM exceeds 2%.

Nevertheless, we can generally divide the study area into two parts. The eastern part includes most soils with a low percentage of organic matter, and the western part includes most soils with a high percentage. A close analysis of this distribution, especially by linking it to vegetation distribution, shows that most bare land in the eastern region has low percentages of organic matter. As for the lands containing trees (without specifying their type), the percentages of organic matter are high and can reach 4.77%, which is aligned with the results of Tesfaye et al. (Citation2016) and highlights the effect of land use on the distribution of several soil parameters and specifically organic matter. In terms of acidity, soils are neutral to slightly alkaline. The pH values range from 7.11 to 7.69, as shown in . These findings agree with those of Aboutayeb et al. (Citation2020). They mentioned a pH varying from 7.23 to 7.28 for conventional and No-till systems, respectively, in the Chaouia Region in Northwest Morocco.

Visually, the correlation between SOM and pH is clearly observed in the mapping results (). The areas with higher SOM content are also characterized by lower pH values, indicating relative soil acidification. This correlation is consistent with the findings of previous studies, which have shown that the decomposition of organic matter produces organic acids that can lower the pH of the soil (Hong, Gan, and Chen Citation2019). This highlights the importance of considering the interplay between different soil quality parameters when managing and protecting our soils. For example, efforts to increase SOM content through organic matter management practices may also lead to a decrease in soil pH and potential acidification. Therefore, a holistic approach is needed to balance the benefits and potential drawbacks of different management strategies.

Figure 10. Local correlation between OM and pH predictions.

Figure 10. Local correlation between OM and pH predictions.

4. Discussion

Soil pH and organic matter are two parameters that strongly influence different soil functions and the availability of nutrients that can be useful for plants (Bai et al. Citation2018). Alkaline pH could negatively affect the bioavailability of mineral nutrients. For example, it can reduce phosphorus and ferrous absorption, which cause plant deficiencies and compromises crop yields. Furthermore, several studies demonstrated the effect of pH on the availability of nutrients for plants (McCauley, Jones, and Jacobsen Citation2009). For instance, potassium is highly available under different pH conditions; phosphorus availability is limited in a range of 5.5 and 7.5. Also, other elements, such as calcium and magnesium, are available in a narrow pH range (between 6.5 and 8.5). Thus, it indicates that this parameter is very sensitive and that small changes in pH can have significant effects such as pesticide efficacy, organic matter decomposition, and nutrient availability (Hock Citation2012; Leifeld, Zimmermann, and Fuhrer Citation2008; Neina Citation2019). Overall, and as mentioned earlier, the range of soil acidity in the study area is appropriate and consistent with most of the soil fertility indicators that may be needed to improve soil fertility.

On the other hand, the importance of SOM cannot be ignored; it plays a crucial role in several soil properties either i) biological: it provides a source of energy and a reservoir of nutrients (N, P, S) and contributes to soil/plant system resilience; ii) chemical: it contributes to cation exchange capacity, buffers pH changes, nutrient availability, reduces toxic cation concentrations or iii) physical: it influences water retention properties, structural stability, and plasticity (Murphy Citation2015; Obalum et al. Citation2017; Yang, Chen, and Yang Citation2019). The SOM directly or indirectly affects various soil functions and services, such as food production and climate change mitigation (Bot and Benites Citation2005; Lal Citation2004; Minasny et al. Citation2017).

This study aimed to develop Random Forest (RF) models for predicting Soil Organic Matter (SOM) and pH using remote sensing indices and topographic Environmental covariatess. The analysis of variable importance was conducted to determine the most influential parameters in explaining the spatial variation of SOM and pH. The results showed that remote sensing indices, such as RVI, NDVI, and TNDVI, had the most significant contribution in explaining the spatial distribution of SOM. Our study results are similar to those reported by Zhang et al. (Citation2021), who mapped regional SOM using an RF model with sentinel-2A and MODIS data, contrary to Wiesmeier et al. (Citation2011). They reported land use as the most important predictor. Several environmental factors influence SOM spatial distribution, including climatic conditions, geological units, plant cover data, and land use data (Zhang et al. Citation2021). It is likely that the remote sensing data, specifically the RVI and NDVI, are the most contributing factors in the RF model’s performance because they are good indicators of vegetation health and productivity. Therefore, vegetation health is related to soil’s amount of organic matter (Van Geel et al. Citation2019), so it makes sense that these indices would be good predictors of soil organic matter (SOM). The RVI is a ratio between the red and near-infrared bands. It can be used to monitor vegetation health, as healthy vegetation absorbs more near-infrared radiation and reflects more red radiation. NDVI is a measure of the difference between near-infrared and red bands, and it is also a good indicator of vegetation health and productivity. Both indices are sensitive to chlorophyll content, which is a good indicator of the amount of photosynthesis in the plant. A higher chlorophyll content means that the plant is healthy and actively photosynthesizing, which is directly linked to the amount of carbon uptake (Croft et al. Citation2015), which is an important component of SOM (Bhattacharyya et al. Citation2022; Van Geel et al. Citation2019). Topographic Environmental covariatess, such as elevation, slope, and aspect, may also influence SOM, but to a lesser degree than remote sensing data. For example, elevation can affect the amount of water available to plants and the amount of solar radiation the soil receives (Zapata‐Rios et al. Citation2016), affecting vegetation growth and organic matter accumulation. However, the effect of topographic Environmental covariatess on SOM may be more subtle and may be overshadowed by the more direct relationship between vegetation health and SOM.

Model performance for pH was weak compared to studies from the Czech Republic, New Zealand, USA, Quebec, and India, respectively (Ramcharan et al. Citation2017; Reddy et al. Citation2021; Roudier et al. Citation2020; Sylvain, Anctil, and Thiffault Citation2021; Žížala et al. Citation2022). It is possible that the insignificant spatial variability of pH could be a contributing factor to the poor performance of the RF model. The pH values are all relatively similar (a minimum and maximum of 6.84 and 7.84, respectively). There may not be enough variation in the data for the model to learn from and make accurate predictions. The model will not be able to identify patterns and relationships between the input Environmental covariatess and the output variable.

Another thing to consider is that we are working on a small dataset. This problem can be resolved by increasing the size of the dataset or gathering more samples, which may help improve the model’s performance. Also, the application of other models that could be more robust to the low-correlated dataset can be tested. The same problem was reported by Zeraatpisheh et al. (Citation2022). They reported that missing variables’ variability could cause poor prediction performance. Dharumarajanet al. (Citation2017) also showed poor performance in pH prediction with the RF model (R2 = 0.30). The pH distribution does not depend on the vegetation distribution as organic matter. However, despite the poor performance, it is more related to the topography, and the effect of slope and other topographic Environmental covariatess on soil acidity distribution appears. The findings in our study corroborate with the report of Zhang et al. (Citation2019) and Seibert et al. (Citation2007) that the spatial variability of soil pH is influenced by annual temperature range and terrain models. Thus, it is possible to observe the accumulation of low-value land in river channels.

However, low SOM soil types (e.g. Luvisols) and pastoral areas are distributed mainly in the east part of the study area. Therefore, the predicted SOM in that region was relatively low. Moreover, cambisols are typically found in low-relief places, where soil migration driven by external factors such as runoff and running water resulted in soil deposition in the area’s eastward direction, resulting in a more significant SOM. As a result, SOM content in the Taounate region tended to be higher in the west and lower in the east direction. Zhang et al. (Citation2021) came up with contrary conclusions for Songnen Plain in China. The low uncertainty suggests that the model is overconfident in its predictions, even though they are not accurate.

5. Conclusion

In conclusion, this study demonstrates the potential of using high-resolution remote sensing data and machine learning models for digital soil mapping. The Random Forest algorithm, when combined with a collection of environmental covariates, was able to accurately predict soil organic matter content with an R2 of 0.90 but performed poorly in pH prediction due to insignificant spatial variability in the study area. Also, vegetation indices play a significant role in the SOM prediction, while the topographic Environmental covariatess significantly influence soil acidity (pH). Despite the limitations in pH predictions, this study provides valuable insights into the spatial variability of soil quality parameters. It highlights the need for further research to improve the region’s predictions of pH and other soil quality parameters.

Acknowledgements

The authors would like to express their gratitude to the anonymous reviewers for their constructive feedback.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The datasets used for the current study are available from the corresponding author [YB] on reasonable request.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Aboutayeb, R., B. El Yousfi, and O. El Gharras. 2020. “Impact of No-Till on Physicochemical Properties of Vertisols in Chaouia Region of Morocco.” Eurasian Journal of Soil Science (EJSS) 9 (2): 119–125. https://doi.org/10.18393/ejss.663502.
  • Asgari, N., S. Ayoubi, A. Jafari, and J. A. Demattê. 2020. “Incorporating environmental variables, remote and proximal sensing data for digital soil mapping of USDA soil great groups.” International Journal of Remote Sensing 41 (19): 7624–7648. https://doi.org/10.1080/01431161.2020.1763506.
  • Ashworth, A. J., F. L. Allen, D. D. Tyler, D. H. Pote, and M. J. Shipitalo. 2017. “Earthworm Populations are Affected from Long-Term Crop Sequences and Bio-Covers Under No-Tillage.” Pedobiologia 60:27–33. https://doi.org/10.1016/j.pedobi.2017.01.001.
  • Bai, Z., T. Caspari, M. R. Gonzalez, N. H. Batjes, P. Mäder, E. K. Bünemann, and Z. Tóth. 2018. “Effects of Agricultural Management Practices on Soil Quality: A Review of Long-Term Experiments for Europe and China.” Agriculture, Ecosystems & Environment 265:1–7. https://doi.org/10.1016/j.agee.2018.05.028.
  • Bhattacharyya, S. S., G. H. Ros, K. Furtak, H. M. Iqbal, and R. Parra-Saldívar. 2022. “Soil Carbon Sequestration–An Interplay Between Soil Microbial Community and Soil Organic Matter Dynamics.” Science of the Total Environment 815:152928. https://doi.org/10.1016/j.scitotenv.2022.152928.
  • Bot, A., and J. Benites. 2005. “The Importance of Soil Organic Matter: Key to Drought-Resistant Soil and Sustained Food Production.” In FAO Soils Bulletin 80. Rome, Italy: Food and Agricultural Organization of the United Nations.
  • Bouasria, A., Y. Bouslihim, S. Gupta, R. Taghizadeh-Mehrjardi, and T. Hengl. 2023. “Predictive Performance of Machine Learning Model with Varying Sampling Designs, Sample Sizes, and Spatial Extents.” Ecological Informatics 78:102294. https://doi.org/10.1016/j.ecoinf.2023.102294.
  • Bouasria, A., K. Ibno Namr, A. Rahimi, and E. M. Ettachfini. 2021. “Geospatial Assessment of Soil Organic Matter Variability at Sidi Bennour District in Doukkala Plain in Morocco.” Journal of Ecological Engineering 22 (11): 120–130. https://doi.org/10.12911/22998993/142935.
  • Bouasria, A., K. Ibno Namr, A. Rahimi, E. M. Ettachfini, and B. Rerhou. 2022. “Evaluation of Landsat 8 Image Pansharpening in Estimating Soil Organic Matter Using Multiple Linear Regression and Artificial Neural Networks.” Geo-Spatial Information Science 25 (3): 353–364. https://doi.org/10.1080/10095020.2022.2026743.
  • Bouslihim, Y., A. Rochdi, R. Aboutayeb, N. El Amrani-Paaza, A. Miftah, and L. Hssaini. 2021. “Soil Aggregate Stability Mapping Using Remote Sensing and GIS-Based Machine Learning Technique.” Frontiers in Earth Science 9:748859. https://doi.org/10.3389/feart.2021.748859.
  • Bouslihim, Y., A. Rochdi, and N. E. A. Paaza. 2021. “Machine Learning Approaches for the Prediction of Soil Aggregate Stability.” Heliyon 7 (3): e06480. https://doi.org/10.1016/j.heliyon.2021.e06480.
  • Breiman, L. 2001. “Random Forests.” Machine Learning 45:5–32.
  • Bünemann, E. K., G. Bongiorno, Z. Bai, R. E. Creamer, G. De Deyn, R. de Goede, and L. Brussaard. 2018. “Soil quality–A critical review.” Soil Biology and Biochemistry 120:105–125. https://doi.org/10.1016/j.soilbio.2018.01.030.
  • Chen, S., D. Arrouays, V. L. Mulder, L. Poggio, B. Minasny, P. Roudier, Z. Libohova, et al. 2022. “Digital Mapping of Soil Properties at a Broad Scale: A Review.” Geoderma 409:115567. https://doi.org/10.1016/j.geoderma.2021.115567.
  • Chen, L., C. Ren, L. Li, Y. Wang, B. Zhang, Z. Wang, and L. Li. 2019. “A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content.” ISPRS International Journal of Geo-Information 8 (4): 174. https://doi.org/10.3390/ijgi8040174.
  • Croft, H., J. M. Chen, N. J. Froelich, B. Chen, and R. M. Staebler. 2015. “Seasonal Controls of Canopy Chlorophyll Content on Forest Carbon Uptake: Implications for GPP Modeling.” Journal of Geophysical Research: Biogeosciences 120 (8): 1576–1586. https://doi.org/10.1002/2015JG002980.
  • Devkota, M., Y. Singh, Y. A. Yigezu, I. Bashour, R. Mussadek, and R. Mrabet. 2022. “Conservation Agriculture in the Drylands of the Middle East and North Africa (MENA) Region: Past Trend, Current Opportunities, Challenges and Future Outlook.” Advances in Agronomy 172:253–305. https://doi.org/10.1016/bs.agron.2021.11.001.
  • Dharumarajan, S., R. Hegde, and S. K. Singh. 2017. “Spatial Prediction of Major Soil Properties Using Random Forest Techniques-A Case Study in Semi-Arid Tropics of South India.” Geoderma Regional 10:154–162. https://doi.org/10.1016/j.geodrs.2017.07.005.
  • Doran, J. W., A. J. Jones, M. A. Arshad, and J. E. Gilley. 2018. Soil quality and soil erosion: determinants of soil quality and health, 17–36. CRC Press. https://doi.org/10.1201/9780203739266-2.
  • El-Mrini, S., R. Aboutayeb, K. Azim, and A. Zouhri. 2021. “Co-Composting Process Assessment of Three-Phase Olive Mill Pomace and Turkey Manure in Morocco.” Journal of Southwest Jiaotong University 56 (6): 764–778. https://doi.org/10.35741/issn.0258-2724.56.6.68.
  • Falahatkar, S., S. M. Hosseini, A. Salman Mahiny, S. Ayoubi, and S. Q. Wang. 2014. “Soil Organic Carbon Stock as Affected by Land Use/Cover Changes in the Humid Region of Northern Iran.” Journal of Mountain Science 11 (2): 507–518. https://doi.org/10.1007/s11629-013-2645-1.
  • Food and Agriculture Organization of the United Nations (FAO). 2019. Standard Operating Procedure for Soil Organic Carbon. Walkley-Black Method: Titration and Colorimetric Method. Retrieved from http://www.fao.org/publications/card/en/c/CA7471EN/.
  • Forkuor, G., O. K. Hounkpatin, G. Welp, and M. Thiel. 2017. “High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.” PloS One 12 (1): e0170478. https://doi.org/10.1371/journal.pone.0170478.
  • Garosi, Y., S. Ayoubi, M. Nussbaum, M. Sheklabadi, M. Nael, and I. Kimiaee. 2022. “Use of the Time Series and Multi-Temporal Features of Sentinel-1/2 Satellite Imagery to Predict Soil Inorganic and Organic Carbon in a Low-Relief Area with a Semi-Arid Environment.” International Journal of Remote Sensing 43 (18): 6856–6880. https://doi.org/10.1080/01431161.2022.2147037.
  • Genuer, R., and J. M. Poggi. 2020. “Random Forests.” In Random Forests with R, 33–55. Cham: Springer. https://doi.org/10.1007/978-3-030-56485-8_3.
  • Gholizadeh, A., L. Borůvka, M. M. Saberioon, J. Kozak, R. Vašát, and K. Němeček. 2015. “Comparing Different Data Preprocessing Methods for Monitoring Soil Heavy Metals Based on Soil Spectral Features.” Soil and Water Research 10 (4): 218–227. https://doi.org/10.17221/113/2015-SWR.
  • Hengl, T., J. Mendes de Jesus, G. B. M. Heuvelink, M. Ruiperez Gonzalez, M. Kilibarda, A. Blagotić. 2017. “SoilGrids250m: Global Gridded Soil Information Based on Machine Learning.” PloS One 12 (2): e0169748. https://doi.org/10.1371/journal.pone.0169748.
  • Hengl, T., M. A. Miller, J. Križan, K. D. Shepherd, A. Sila, M. Kilibarda, O. Antonijević, L. Glušica, A. Dobermann, S. M. Haefele. 2021. “African Soil Properties and Nutrients Mapped at 30 M Spatial Resolution Using Two-Scale Ensemble Machine Learning.” Scientific Reports 11 (1): 1–18. https://doi.org/10.1038/s41598-021-85639-y.
  • He, M., X. Xiong, L. Wang, D. Hou, N. S. Bolan, Y. S. Ok, and D. C. Tsang. 2021. “A Critical Review on Performance Indicators for Evaluating Soil Biota and Soil Health of Biochar-Amended Soils.” Journal of Hazardous Materials 414:125378. https://doi.org/10.1016/j.jhazmat.2021.125378.
  • Hock, W. K. 2012. Effect of pH on Pesticide Stability and Efficacy. Pesticide Safety Education Program (PSEP). Cornell University. http://psep.cce.cornell.edu/facts-slides-self/facts/gen-peapp-ph.aspx.
  • Hong, S., P. Gan, and A. Chen. 2019. “Environmental Controls on Soil pH in Planted Forest and Its Response to Nitrogen Deposition.” Environmental Research 172:159–165. https://doi.org/10.1016/j.envres.2019.02.020.
  • Houborg, R., and M. F. McCabe. 2018. “A Hybrid Training Approach for Leaf Area Index Estimation via Cubist and Random Forests Machine-Learning.” ISPRS Journal of Photogrammetry and Remote Sensing 135:173–188. https://doi.org/10.1016/j.isprsjprs.2017.10.004.
  • Hounkpatin, K. O., A. Y. Bossa, Y. Yira, M. A. Igue, and B. A. Sinsin. 2022. “Assessment of the Soil Fertility Status in Benin (West Africa)–Digital Soil Mapping Using Machine Learning.” Geoderma Regional 28:e00444. https://doi.org/10.1016/j.geodrs.2021.e00444.
  • John, K., Y. Bouslihim, A. Bouasria, R. Razouk, L. Hssaini, I. A. Isong, and G. Ambrose-Igho. 2022. “Assessing the Impact of Sampling Strategy in Random Forest-Based Predicting of Soil Nutrients: A Study Case from Northern Morocco.” Geocarto International 37 (26): 11209–11222. https://doi.org/10.1080/10106049.2022.2048091.
  • John, K., Y. Bouslihim, K. I. Ofem, L. Hssaini, R. Razouk, P. B. Okon, I. A. Isong, P. C. Agyeman, N. M. Kebonye, and C. Qin. 2022. “Do Model Choice and Sample Ratios Separately or Simultaneously Influence Soil Organic Matter Prediction?” International Soil & Water Conservation Research 10 (3): 470–486. https://doi.org/10.1016/j.iswcr.2021.11.003.
  • Karchegani, P. M., S. Ayoubi, M. R. Mosaddeghi, and N. Honarjoo. 2012. “Soil Organic Carbon Pools in Particle-Size Fractions as Affected by Slope Gradient and Land Use Change in Hilly Regions, Western Iran.” Journal of Mountain Science 9 (1): 87–95. https://doi.org/10.1007/s11629-012-2211-2.
  • Kursa, M. B., and W. R. Rudnicki. 2010. “Feature Selection with the Boruta Package.” Journal of Statistical Software 36 (11): 1–13. https://doi.org/10.18637/jss.v036.i11.
  • Lal, R. 2004. “Soil Carbon Sequestration Impacts on Global Climate Change and Food Security.” Science 304 (5677): 1623–1627. https://doi.org/10.1126/science.1097396.
  • Lamichhane, S., L. Kumar, and B. Wilson. 2019. “Digital Soil Mapping Algorithms and Covariates for Soil Organic Carbon Mapping and Their Implications: A Review.” Geoderma 352:395–413. https://doi.org/10.1016/j.geoderma.2019.05.031.
  • Laurent, C., M. N. Bravin, O. Crouzet, C. Pelosi, E. Tillard, P. Lecomte, and I. Lamy. 2020. “Increased soil pH and dissolved organic matter after a decade of organic fertilizer application mitigates copper and zinc availability despite contamination.” Science of the Total Environment 709:135927. https://doi.org/10.1016/j.scitotenv.2019.135927.
  • Leifeld, J., M. Zimmermann, and J. Fuhrer. 2008. “Simulating Decomposition of Labile Soil Organic Carbon: Effects of pH.” Soil Biology and Biochemistry 40 (12): 2948–2951. https://doi.org/10.1016/j.soilbio.2008.08.019.
  • McCauley, A., C. Jones, and J. Jacobsen. 2009. “Soil pH and Organic Matter.” Nutrient Management Module 8 (2): 1–12.
  • Minasny, B., B. P. Malone, A. B. McBratney, D. A. Angers, D. Arrouays, A. Chambers, V. Chaplot, et al. 2017. “Soil carbon 4 per mille.” Geoderma 292:59–86. https://doi.org/10.1016/j.geoderma.2017.01.002.
  • Moussadek, R., R. Mrabet, R. Dahan, A. Zouahri, M. El Mourid, and E. V. Ranst. 2014. “Tillage System Affects Soil Organic Carbon Storage and Quality in Central Morocco.” Applied & Environmental Soil Science 2014:1–8. https://doi.org/10.1155/2014/654796.
  • Mrabet, R., R. Moussadek, A. Fadlaoui, and E. Van Ranst. 2012. “Conservation Agriculture in Dry Areas of Morocco.” Field Crops Research 132:84–94. https://doi.org/10.1016/j.fcr.2011.11.017.
  • Murphy, B. W. 2015. “Impact of Soil Organic Matter on Soil Properties—A Review with Emphasis on Australian Soils.” Soil Research 53 (6): 605–635. https://doi.org/10.1071/SR14246.
  • Neina, D. 2019. “The Role of Soil pH in Plant Nutrition and Soil Remediation.” Applied & Environmental Soil Science 2019:1–9. https://doi.org/10.1155/2019/5794869.
  • Obalum, S. E., G. U. Chibuike, S. Peth, and Y. Ouyang. 2017. “Soil organic matter as sole indicator of soil degradation.” Environmental Monitoring and Assessment 189 (4): 176. https://doi.org/10.1007/s10661-017-5881-y.
  • Poggio, L., L. M. De Sousa, N. H. Batjes, G. Heuvelink, B. Kempen, E. Ribeiro, and D. Rossiter. 2021. “SoilGrids 2.0: Producing Soil Information for the Globe with Quantified Spatial Uncertainty.” Soil 7 (1): 217–240. https://doi.org/10.5194/soil-7-217-2021.
  • Pouladi, N., A. B. Møller, S. Tabatabai, and M. H. Greve. 2019. “Mapping Soil Organic Matter Contents at Field Level with Cubist, Random Forest and Kriging.” Geoderma 342:85–92. https://doi.org/10.1016/j.geoderma.2019.02.019.
  • Ramcharan, A., T. Hengl, T. Nauman, C. Brungard, S. Waltman, S. Wills, and J. Thompson 2017. Soil Property and Class Maps of the Conterminous US at 100 Meter Spatial Resolution Based on a Compilation of National Soil Point Observations and Machine Learning. arXiv preprint arXiv:1705.08323.
  • R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
  • Reddy, N. N., P. Chakraborty, S. Roy, K. Singh, B. Minasny, A. B. McBratney, A. Biswas, and B. S. Das. 2021. “Legacy Data-Based National-Scale Digital Mapping of Key Soil Properties in India.” Geoderma 381:114684. https://doi.org/10.1016/j.geoderma.2020.114684.
  • Robertson, G. P., J. R. Crum, and B. G. Ellis. 1993. “The Spatial Variability of Soil Resources Following Long-Term Disturbance.” Oecologia 96 (4): 451–456. https://doi.org/10.1007/BF00320501.
  • Roudier, P., O. R. Burge, S. J. Richardson, J. K. McCarthy, G. J. Grealish, and A. G. Ausseil. 2020. “National scale 3D mapping of soil pH using a data augmentation approach.” Remote Sensing 12 (18): 2872. https://doi.org/10.3390/rs12182872.
  • Samuel-Rosa, A., G. B. M. Heuvelink, G. M. Vasques, and L. H. C. Anjos. 2015. “Do more detailed environmental covariates deliver more accurate soil maps?” Geoderma 243-244:214–227. https://doi.org/10.1016/j.geoderma.2014.12.017.
  • Seibert, J., J. Stendahl, and R. Sørensen. 2007. “Topographical Influences on Soil Properties in Boreal Forests.” Geoderma 141 (1–2): 139–148. https://doi.org/10.1016/j.geoderma.2007.05.013.
  • Shi, J. J., L. Yang, A. X. Zhu, C. Z. Qin, P. Liang, C. Y. Zeng, and T. Pei. 2018. “Machine-Learning Variables at Different Scales Vs. Knowledge-Based Variables for Mapping Multiple Soil Properties.” Soil Science Society of America Journal 82 (3): 645–656. https://doi.org/10.2136/sssaj2017.11.0392.
  • Shrestha, D. L., and D. P. Solomatine. 2006. “Machine Learning Approaches for Estimation of Prediction Interval for the Model Output.” Neural Networks 19 (2): 225–235. https://doi.org/10.1016/j.neunet.2006.01.012.
  • Stone, D., K. Ritz, B. G. Griffiths, A. Orgiazzi, and R. E. Creamer. 2016. “Selection of biological indicators appropriate for European soil monitoring.” Applied Soil Ecology 97:12–22. https://doi.org/10.1016/j.apsoil.2015.08.005.
  • Sylvain, J. D., F. Anctil, and É. Thiffault. 2021. “Using Bias Correction and Ensemble Modelling for Predictive Mapping and Related Uncertainty: A Case Study in Digital Soil Mapping.” Geoderma 403:115153. https://doi.org/10.1016/j.geoderma.2021.115153.
  • Taghizadeh-Mehrjardi, R., K. Nabiollahi, and R. Kerry. 2016. “Digital Mapping of Soil Organic Carbon at Multiple Depths Using Different Data Mining Techniques in Baneh Region, Iran.” Geoderma 266:98–110. https://doi.org/10.1016/j.geoderma.2015.12.003.
  • Tajik, S., S. Ayoubi, and M. Zeraatpisheh. 2020. “Digital Mapping of Soil Organic Carbon Using Ensemble Learning Model in Mollisols of Hyrcanian Forests, Northern Iran.” Geoderma Regional 20:e00256. https://doi.org/10.1016/j.geodrs.2020.e00256.
  • Tesfaye, M. A., F. Bravo, R. Ruiz-Peinado, V. Pando, and A. Bravo-Oviedo. 2016. “Impact of Changes in Land Use, Species and Elevation on Soil Organic Carbon and Total Nitrogen in Ethiopian Central Highlands.” Geoderma 261:70–79. https://doi.org/10.1016/j.geoderma.2015.06.022.
  • Tibhirine, Z., K. Ibno Namr, A. Bouasria, B. El Bourhrami, and H. Ettayeb. 2023. “Geospatial and Temporal Assessment of the Variability of Soil Organic Matter and Electrical Conductivity in Irrigated Semi-Arid Area.” Geology, Ecology & Landscapes 1–12. https://doi.org/10.1080/24749508.2023.2179748.
  • Van Geel, M., K. Yu, G. Peeters, K. van Acker, M. Ramos, C. Serafim, and O. Honnay. 2019. “Soil Organic Matter Rather Than Ectomycorrhizal Diversity is Related to Urban Tree Health.” PLoS One 14 (11): e0225714. https://doi.org/10.1371/journal.pone.0225714.
  • Wiesmeier, M., F. Barthold, B. Blank, and I. Kögel-Knabner. 2011. “Digital Mapping of Soil Organic Matter Stocks Using Random Forest Modeling in a Semi-Arid Steppe Ecosystem.” Plant and Soil 340 (1–2): 7–24. https://doi.org/10.1007/s11104-010-0425-z.
  • Wulanningtyas, H. S., Y. Gong, P. Li, N. Sakagami, J. Nishiwaki, and M. Komatsuzaki. 2021. “A Cover Crop and No-Tillage System for Enhancing Soil Health by Increasing Soil Organic Matter in Soybean Cultivation.” Soil and Tillage Research 205:104749. https://doi.org/10.1016/j.still.2020.104749.
  • Yang, X., X. Chen, and X. Yang. 2019. “Effect of Organic Matter on Phosphorus Adsorption and Desorption in a Black Soil from Northeast China.” Soil and Tillage Research 187:85–91. https://doi.org/10.1016/j.still.2018.11.016.
  • Yu, F., B. Faybishenko, A. Hunt, and B. Ghanbarian. 2017. “A Simple Model of the Variability of Soil Depths.” Water 9 (7): 460. https://doi.org/10.3390/w9070460.
  • Zapata‐Rios, X., P. D. Brooks, P. A. Troch, J. McIntosh, and Q. Guo. 2016. “Influence of Terrain Aspect on Water Partitioning, Vegetation Structure and Vegetation Greening in High‐Elevation Catchments in Northern New Mexico.” Ecohydrology 9 (5): 782–795. https://doi.org/10.1002/eco.1674.
  • Zeraatpisheh, M., S. Ayoubi, A. Jafari, and P. Finke. 2017. “Comparing the Efficiency of Digital and Conventional Soil Mapping to Predict Soil Types in a Semi-Arid Region in Iran.” Geomorphology 285:186–204. https://doi.org/10.1016/j.geomorph.2017.02.015.
  • Zeraatpisheh, M., S. Ayoubi, A. Jafari, S. Tajik, and P. Finke. 2019. “Digital Mapping of Soil Properties Using Multiple Machine Learning in a Semi-Arid Region, Central Iran.” Geoderma 338:445–452. https://doi.org/10.1016/j.geoderma.2018.09.006.
  • Zeraatpisheh, M., Y. Garosi, H. R. Owliaie, S. Ayoubi, R. Taghizadeh-Mehrjardi, T. Scholten, and M. Xu. 2022. “Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates.” Catena 208:105723. https://doi.org/10.1016/j.catena.2021.105723.
  • Zhang, Y. Y., W. Wu, and H. Liu. 2019. “Factors Affecting Variations of Soil pH in Different Horizons in Hilly Regions.” PloS One 14 (6): e0218563.
  • Zhang, M., M. Zhang, H. Yang, Y. Jin, X. Zhang, and H. Liu. 2021. “Mapping Regional Soil Organic Matter Based on Sentinel-2A and MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine.” Remote Sensing 13 (15): 2934. https://doi.org/10.3390/rs13152934.
  • Zhao, W., T. Cao, Z. Li, and J. Sheng. 2019. “Comparison of IDW, Cokriging and ARMA for Predicting Spatiotemporal Variability of Soil Salinity in a Gravel–Sand Mulched Jujube Orchard.” Environmental Monitoring and Assessment 191 (6): 1–15. https://doi.org/10.1007/s10661-019-7499-8.
  • Žížala, D., R. Minařík, J. Skála, H. Beitlerová, A. Juřicová, J. R. Rojas, V. Penížek, and T. Zádorová. 2022. “High-Resolution Agriculture Soil Property Maps from Digital Soil Mapping Methods, Czech Republic.” Catena 212:106024. https://doi.org/10.1016/j.catena.2022.106024.
  • Zornoza, R., J. A. Acosta, F. Bastida, S. G. Domínguez, D. M. Toledo, and A. Faz. 2015. “Identification of Sensitive Indicators to Assess the Interrelationship Between Soil Quality, Management Practices and Human Health.” Soil 1 (1): 173–185. https://doi.org/10.5194/soil-1-173-2015.