534
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Mapping changes of grassland to arable land using automatic machine learning of stacked ensembles and H2O library

ORCID Icon & ORCID Icon
Article: 2294127 | Received 27 Jun 2023, Accepted 06 Dec 2023, Published online: 22 Dec 2023

ABSTRACT

Permanent grasslands play a very important role in the landscape. The loss of permanent grasslands and their subsequent conversion into arable land create erosion-prone agricultural areas in the landscape and have a negative impact on the biodiversity. From this point of view, there is a need for the accurate and effective monitoring of changes in the agricultural landscape along with an assessment of the influence of the agricultural policies on the landscape. Sentinel-2 from the Copernicus programme has improved options for the implementation of remote sensing data into the monitoring of agricultural land. The aim of this study was to evaluate the potential of H2O library and within implemented Automachine learning function (AutoML) and its stacked ensembles for mapping changes from grasslands to arable lands. All results show high overall accuracy from 93.5% to 96.6% and high values of area under the ROC curve (0.94–0.98). Stacked ensembles appear to be the most accurate machine learning models for mapping changes from grasslands to arable lands. The importance of several biological predictors has been tested (FAPAR, FCOVER, LAI, NDVI, etc.) with the help of a heatmap that is part of AutoML function of H2O library.

Introduction

The European agricultural sector is significantly affected by more significant subsidy policies and market factors (e.g. CAP, support for biofuels, changes in the global market). One of the consequences of the agricultural intensification is the expansion of arable land at the expense of permanent grasslands (Schmidt et al., Citation2018; Soons et al., Citation2005; Toland et al., Citation2008). In this regard, it should be mentioned that permanent grasslands are an ecologically important component of the landscape (Sklenička, Citation2002) and the reduction in their areas has a significant negative impact on the ecological stability of the landscape (Ceballos et al., Citation2010). From this point of view, a detection of the changes in the use of agricultural land is a highly relevant topic.

Remote sensing is considered to be an effective and promising tool to monitor the land cover, including the use of agricultural land (Arrouays et al., Citation2001; Lobell & Field, Citation2007; Vertès et al., Citation2007). Remote sensing technologies offer a fast and efficient way to map large-scale areas, especially in cases where a lot of satellite data are freely available, such as with Sentinel-2 satellites, which offering the best spatial resolution of 10 m and a temporal resolution of 5 days (Drusch et al., Citation2012). Recently, a number of methods have been developed to detect conditions and changes using remote sensing technologies. Extensive data archives based on freely available satellite images such Sentinel-2 or Landsat are widely used for change detection using time series analyses. The multitemporal approach using a time series is traditionally used to monitor changes in the use of agricultural land structures, as it allows one to effectively capture the differences between the individual phenological phases of the vegetation (crops) (Csillik & Belgiu, Citation2017; Klouček et al., Citation2018).

Although significant progress has been made in the accuracy of classification of agricultural lands, the mapping of changes in permanent grassland and arable land remains a relevant research challenge. Mapping changes from permanent grassland to arable land is a relatively challenging matter as the change detection process requires capturing and accurately analyzing changes during individual phenological phases (Pakzad et al., Citation2001). The higher temporal resolution of Sentinel-2 data is suitable for use in a time series, which allows oscillations between individual phenological phases of vegetation to be captured. Distinguishing temporary changes caused by the phenological phases of the vegetation from permanent changes is relatively difficult, as the phenological phases of individual plant species are highly variable (Pakzad et al., Citation2001) and significantly influenced by the local and climatic conditions (Lurette et al., Citation2013). For this reason, the multitemporal approach using a time series has been tested and used in several studies to detect changes between permanent grassland and arable land. Mardian et al. (Citation2021) compared temporal and spatial resolution of TERRA MODIS and Landsat satellites and they revealed that temporal resolution in favour of MODIS satellite plays major role in mapping changes from grassland to cropland. Šandera and Štych (Citation2020) proposed methodology to define the most relevant variables for mapping changes from grasslands to arable lands based on Random Forest classifier. Esch et al. (Citation2014) mapped crops and grasslands with help of LPIS and C5.0 decision tree classifier with 86% accuracy. Chen et al. (Citation2018) or Xu et al. (Citationn.d.) dealt with trend forecast based approach for cropland change detection using Landsat-derived time series metrics. Csillik and Belgiu (Citation2017) tested the time series of Sentinel-2 data used object-based method in the cropland mapping.

An important methodological aspect of change detection based on time series is the selection of predictors that best describe and detect the observed phenomena (Klouček et al., Citation2018; Šandera & Štych, Citation2020). A wide range of the combination of spectral bands and vegetation indices have already been studied for this purpose (Bannari et al., Citation1995). The most widely used vegetation indices can be considered such as NDVI (Normalized Difference Vegetation Index) or NDMI (Normalized Difference Moisture Index) (Rouse et al., Citation1974). There are other vegetation indices used such as NDWI (Normalized Difference Water Index) which was primary proposed by Gao (Citation1996). NDMI and NDWI are primarily dedicated to water content mapping in soil and plants, they can be also utilized as complementary information in vegetation analysis thanks to their infrared bands in their calculation. Infrared bands provide additional thematic information which can potentially help to distinguish between health condition of plant species or it can help to detect potentially abrupt changes between agricultural plants on soil blocks. Sensitivity to water content may potentially remove correlation in situation where too many vegetation indices with similar properties are included in analysis – such as NDVI or EVI (Enhanced Vegetation Index) (Huete et al., Citation2002).

Sentinel-2 data offer calculations of specific indices. These mainly consist of the relevant spectral bands in near (red edge) or middle infrared, e.g NDWI2 which utilises near infrared band in its formula, see McFeeters (Citation1996). Moreover, Sentinel-2 offer calculations of dedicated variables known as biophysical parameters (in further text these biophysical parameters are named as biological predictors including tested vegetation indices – ). These biological predictors were specifically designed to map vegetated areas including to map changes from grasslands to arable lands. Part of these biological predictors are known as FAPAR (Fraction of Absorbed Photosynthetically Active Radiation), FCOVER (Fractional Vegetation Cover), LAI (Leaf Area Index). For more details, please see . From above mentioned, a non-trivial research task is the selection of the most suitable biological predictors or their combination for the accurate detection of changes. This is mainly the effect of the time acquisition of a satellite image within a time series and a specific biological predictor or spectral band from original satellite data. Some bands or indices/predictors are correlated and bring redundant information to the identification of the observed phenomena (Klouček et al., Citation2018; Šandera & Štych, Citation2020). There is a continuing need to conduct studies to select suitable predictors and adapt them for specific purposes, such as detecting changes from permanent grassland to arable land.

Another need is a need to automate classification processes (Szostak & Hawryło Pawełand Piela, Citation2018) so that satellite image time series can be processed quickly and efficiently with the required degree of accuracy of classification/change detection. Therefore there is a need to develop user-friendly and automated computing tasks that will be able to detect changes from grasslands to arable lands in an accurate and reliable way. Modern methods offer new challenges and options for processing remote sensing data. One possibility has been offered in the form of machine learning algorithms and their advanced models – stacked ensembles. Stacked ensembles are advanced machine learning models composed of many simpler machine learning algorithms based on their prediction accuracy. Outputs of these simpler machine learning algorithms are put together in the form of stacking in order to create one more accurate model with higher accuracy. The approach of stacked ensembles of machine learning algorithms has been proposed by (Wolpert, Citation1992). There is a problem that many different machine learning algorithms have different parameters that are needed to be tuned to work properly. This fact creates problems for many potential users who are not familiar with these advanced machine learning techniques. Fortunately, there is a user-friendly and effective solution for these situations in the form of H2O library (LeDell et al., Citation2020). There is an implemented function in the H2O library called AutoML (LeDell & Poirier, Citation2020). This function creates a stacked ensemble from a family of implemented machine learning algorithms. For more details, please see original documentation: Distributed Random Forest (DRF) (Asgari et al., Citation2022), Extremely Randomized Trees (XRT) (Wang et al., Citation2023; Zafari et al., Citation2019), Generalized Linear Model with regularization (Leyk & Zimmermann, Citation2007; Morisette & Khorram, Citation1997), Gradient Boosting Machine (GBM) (Asadollah et al., Citation2022; Bui et al., Citation2021), Extreme Gradient Boosting(XGBoost) (Bui et al., Citation2021; Just et al., Citation2019; Luo et al., Citation2021; Ribeiro & dos Santos Coelho, Citation2020a) and finally Deep Learning algorithm in the form of fully connected multilayer artificial neural network (Tan & Xue, Citation2022; Tripathi et al., Citation2022).

From the above mentioned, there are two relevant research challenges which change detection using time series should deal with. The first one is the proper selection of original spectral band or biological predictor for the mapping changes from grasslands to arable lands and the second one to determine proper input parameters of individual classification algorithms. The proper tuning classifiers’ parameters has a significant impact on overall classification accuracy. Vegetation indices are selected empirically however determination of the best one still remains challenging as well as input parameters of machine learning classifiers which are not properly tuned. For this reason, it may not be reached their full potential in terms of classification accuracy. There is a challenge to properly decide which spectral band or biological predictor is the best with a combination of properly tuned classifiers to be reached the highest accuracy. It can be treated as the form of feature selection especially with combination of stacked ensembles (Zhang et al., Citation2022).

AutoML function solves a major issue for many users because it automatically tunes hyperparameters of implemented algorithms with the help of cross-validation (Cross-validation statistics, Citation2023). AutoML function demands to set maximum time required to test input parameters of implemented classifiers. The classifiers are tested and designed based on their best input parameters and overall accuracy. The best model can be stacked ensemble or individual classifier that reached maximum classification accuracy. The best revealed classifier is then used for classification itself. Final outputs of this process is a thematic map, classified image based on properly tuned classifier and the heatmap plot that informs the user of variable importance.

H2O offers the implementation of many machine learning algorithms that have not been widely tested in Remote Sensing (Zhang et al., Citation2022) in difference to the standard Random Forest algorithm (Belgiu & Csillik, Citation2018; Belgiu & Drăguţ, Citation2016). There are algorithms that reduce bias and variance – Extremely Randomized Trees, Gradient Boosting Machines, and Extreme Gradient Boosting. The ambition of this study is a contribution to the finding the implementation of the machine learning algorithm of H2O for the purpose of detection of grassland to arable land changes. This paper develops the previous studies which proved the effectiveness of selected algorithms and investigated the relevancy of the selected predictors/indices for changes from grasslands to arable lands (Šandera & Štych, Citation2019). Following the dynamic progress in the machine learning classifiers and open data, the variable importance of algorithm/parameters is tested with time image acquisition of Sentinel-2 satellite data in the univariate time series data for each individual biological predictor. The main research aims of this study are as follows:

  • develop advanced machine learning method for mapping changes from grasslands to arable lands with help of H2O library

  • compare and determine the accuracy of machine learning models in H2O library for all tested biological predictors

  • identify potential optimal time window for all biological predictor tested with H2O library

  • to develop and provide the resulted method as open source algorithm

  • to discuss pros and cons of the used methods and interpret the results

Materials and methods

Study areas

A total of three study sites in the Czech Republic were selected for the purposes of this study (). These sites were selected to be typologically/geographically different and, at the same time, where there was a higher probability of changes in permanent grassland to arable land in the period 2015 – 2020. The selected study areas represent differences in altitude in the area of the Czech Republic. Different altitudes influence the start and end of vegetation seasons of crop species so that there is an obvious difference between environmental conditions. The obtained results under these different conditions are more valid and valuable. Selected study areas are described as follows:

Figure 1. Map of the study areas.

Figure 1. Map of the study areas.

Study locality No. 1 is located on the border of the Bohemian Paradise and the foothills of the Giant Mountains. The affected area of locality 1 includes climatically and landscape heterogeneous areas with an abundant occurrence of both arable land and permanent grasslands. The coordinates of the central point of the locality are 15.257211 N, 50.387475 E. with a total area of 1340 km2. The second locality, study locality No. 2, lies on the border of three regions: the Pardubice and South Moravian regions and the Vysočina region. From a physical-geographical point of view, this is a highly variable area, with areas with higher altitudes and worse conditions for intensive agriculture, especially in the south, than in the north and east. The central point has the coordinates 16.269252 N, 49.722447 E. with a total area of 1340 km2. Study locality No. 3 is located between two cities Uhlířské Janovice and Ronov nad Doubravou in the north and Zruč and Sázavou, Ledeč and Sázavou, Světlá and Sázavou in the central part of the study area. This case study has the characteristics of higher altitudes, hilly terrain, and colder weather. The central point has the coordinates 15.253716 N, 49.729114 E. with a total area of 1340 km2. The monitored changes were verified in advance based on a vector overlap of layers from the LPIS database. The location of the individual study sites within the Czech Republic can be found in .

Input data

A time series of optical data from the Sentinel-2 satellite was used to analyse changes from permanent grassland to arable land. A total of 23 images were used (). The main criterion was to select images that would be the least cloudy as possible – i.e. the cloud coverage would be less than 5% within the entire scene. The intention was to create a time series to cover the entire vegetation season of the study site each year – i.e. from March to the end of September in the observed period from 2015 to 2020. The time series of the satellite data was chosen to take the phenological phases of the agricultural crops and grasses into account in order to capture the best time period to detect the changes from permanent grassland to arable land. All the images were downloaded from the open access portal Open Access Hub of the European Space Agency (ESA) (Open Access Hub (copernicus.eu)).

Table 1. Acquired satellite images.

Other input data were obtained from the vector layers from the LPIS database, which served as reference data. LPIS is a geographic information system whose basic unit is a soil block, which includes areas where agricultural crops are grown in a regular cycle (Trojáček, Citation2002). LPIS database has been designed as database to store unified information about agricultural lands for the subsidy policies in the entire Europe (https://op.europa.eu/en/publication-detail/-/publication/11049e0e-9a82-11e6-9bca-01aa75ed71a1). Soil blocks are stored in the vector file format of ESRI shapefile. LPIS for the Czech republic is freely accessible from Czech Ministry of Agriculture website - https://eagri.cz/public/app/eagriapp/lpisdata/. Permanent grasslands, in the LPIS, are defined as all areas where grasses have been grown for at least five years and are subject to a single European Union subsidy policy. Arable land is ploughed or tilled soil block generally under a system of crop rotation, unfortunately without information about specific crop.

Polygons of permanent grasslands and arable land were extracted from the LPIS database for each marginal year separately – i.e. 2015 and 2020. Subsequently, these layers were spatially overlapped in order to identify areas of change from permanent grasslands to arable land. Polygons of permanent grasslands, arable land, and change areas served as a mask and a reference for the creation of training data for the AutoML function. The date of the last editing of the LPIS polygon layers was the 31st of December for both 2015 and 2020.

Methods used

Satellite data preprocessing

The methodological procedure consists of several basic steps; see . In the first step, atmospheric corrections for the Sentinel-2 data were performed using the Sen2Cor version 2.5 module (Louis et al., Citation2013; Main-Knorn et al., Citation2017). Furthermore, cloud and shadow masking were performed for each image separately using the scene classification (it is a cloud and shadow mask), which is a product of the Sen2Cor module. After the atmospheric corrections, all the images were resampled to a pixel size of 10 m by the nearest neighbour method, as Sentinel-2 satellites contain spectral bands with different spatial resolutions (Drusch et al., Citation2012). In the detected cloud locations, the time series was interpolated by the linear method, i.e. the missing pixels were interpolated/supplemented. The interpolation was performed in the R software environment (R Core Team, Citation2020) using the approxNA function, which is part of the raster package (Hijmans, Citation2020). Sentinel-2 data that underwent the above-mentioned pre-processing steps were then used to calculate the biological predictors (). The original spectral bands that include the near-infrared part of the electromagnetic spectrum – bands 8 and 8A – were also added in the analysis itself. Spectral bands B8 and B8A provide spectral information in the near-infrared region, which is important for detecting changes from permanent grassland to arable land.

Figure 2. Methodology workflow – designed methods of the change detection process from permanent grassland to arable land.

Figure 2. Methodology workflow – designed methods of the change detection process from permanent grassland to arable land.

Table 2. Tested biological predictors.

The selection of the variables-vegetation indices/biophysical variables was inspired by the most relevant references (e.g. Csillik & Belgiu, Citation2017; Klouček et al., Citation2018) and based on our previous studies, (e.g. Šandera & Štych, Citation2020). We selected and calculated LAI, FCOVER, FAPAR biophysical variables because they potentially offer added, useful information and it is recommended to combine them with vegetation indices in case of mapping changes from grasslands to arable lands (Šandera & Štych, Citation2020). We included the NDVI vegetation index that has been traditionally used with several strengths for sensitivity to green vegetation detection (Tucker, Citation1979). NDWI and NDWI2 vegetation indices were selected due to the fact they contain useful information about water content. Green vegetation is a complex of green biomass and water content. Therefore biophysical parameters and vegetation indices () were included in the analysis. The term “biological predictors”, which is used further in the text, groups the used biophysical variables and vegetation indices under one term. List of the predictors is documented in .

The calculation of the biological predictors () was performed in the program SNAP version 8.0 (European Space Agency, Citation2020). Based on the calculations, the time series of the individual predictors and spectral bands were created.

Reference data preparation

For the classification and the subsequent evaluation of the accuracy, reference data were created based on the change and non-change polygons from the LPIS database created by the spatial overlap. For each border year (2015 and 2020), vector layers containing soil blocks with permanent grasslands and arable land were exported from the LPIS database, which were subsequently unified/covered. This layer served as a mask of the changes from permanent grassland to arable land. The change polygons were validated by visual inspection using a freely available orthophoto (ČÚZK) for the years 2015 and 2020. The average values of the input predictors () were calculated for the resulting mask using zonal statistics. For each predictor () and the original spectral band, a vector layer was obtained, the attribute table of which contained the average values of each soil block within the time series ().

Change polygons created in previous steps served as training and validation datasets in the form of points. A basic data set of 10,000 reference points for the training and validation was created. From this dataset three independent datasets were derived: one for training classifier itself, the second for validation of classifier during cross-validation and the third one for the accuracy assessment. The reference points, thus, created were divided into two parts: training and validation points in a 1:1 ratio, i.e. 50% training points and 50% validation points. Training points were further split into training (70%) and validation (30%) datasets for the purpose of training and validation of machine learning classifiers during cross-validation. Recommended proportions of training and validation data were created.

The training points were distributed proportionally according to the total area of the classification classes within the study site using the stratified random sampling method. Two classification classes were defined: soil blocks with change (all the soil blocks of the changes from permanent grassland to arable land) and soil blocks without change.

Implemented algorithms in H2O’s AutoML function

Distributed Random Forest (DRF)

Distributed Random Forest is an advanced enhancement of the original Random Forest algorithm (Breiman Citation2001). The core computing mechanism remains almost the same, but there are differences. Main differences collocate with decision rules in nodes of a tree structure. Distributed Random Forest algorithm splits each sample of data in the form of the greatest reduction in the residual sum of the square in each subtree of the whole tree structure (Distributed Random Forest (DRF), Citation2023). The original Random Forest algorithm uses bootstrapping of samples of original data. The further tree structure is created based on an average of the individual tree branch for classification and majority voting for regression.

Extremely Randomized Trees (XRT, extra trees)

When the Extremely Randomized Trees algorithm (Distributed Random Forest (DRF), Citation2023) (Geurts et al., Citation2006) is compared to the original Random Forest algorithm, the main difference refers again to the splitting rule in the whole tree structure. The Extremely Randomized Trees algorithm splits the original sample of source data instead of bootstrapping just like it is in the case of the original Random Forest algorithm. A further difference is connected with cutpoints in order to create split nodes. The original Random Forest algorithm finds optimum splits, on the other hand Extremely Randomized Trees algorithm chooses splits randomly in order to reduce bias and variance. Extremely Randomized Trees algorithm is computationally faster than original Random Forest.

Gradient Boosting Machine (GBM)

Gradient Boosting Machines is a decision tree-based classification and regression algorithm that utilises boosting technique (Elith et al., Citation2008; Friedman, Citation2001; Gradient Boosting Machine (GBM), Citation2023). The core of the boosting technique is a combination of weak learners in order to create one strong classifier or regressor. As weak learners for gradient boosting machine work decision trees. The combination of weak learners lies in a proper loss function that minimizes error during the iterative training process of each weak learner. The loss function minimizes root mean square error based on a negative gradient of each weak learner in comparison to the original AdaBoost algorithm, where weak learners are marked by weight coefficients based on their error. The bigger error, the higher value of the weight coefficient (Freund et al., Citation1996; Freund & Schapire, Citation1997; Schapire, Citation2003).

Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting Algorithm (Chen et al., Citation2017; Chen & Guestrin, Citation2016; Corresp et al., Citation2017; XGBoost, 2023) is the successor of GBM, and it utilises gradient boosting technique as well. XGBoost algorithm is concentrated on control of over-fitting in a more regularized form in comparison to GBM. It means that XGBoost uses loss function and regularization term during the training process. It is necessary to note that XGBoost algorithm in the most recent version of H2O only works under the Linux operating system.

Deep learning algorithm (multilayer perceptron)

H2O’s implementation lies in a multilayer feedforward artificial neural network (Deep Learning (Neural Networks), Citation2023) which is trained with the help of stochastic gradient descent using back-propagation known as Multilayer Perceptron. The structure of H2O’s implementation of multilayer perceptron is fully customizable. The user can set the requested number of hidden layers and its neurons including activation function – tanh, rectifier and maxout. H2O in the most recent version does not implement other common deep learning algorithms: Convolutional Neural Networks and Recurrent Neural Networks.

Generalized Linear Model (GLM)

The Generalized Linear Model algorithm (Generalized Linear Model (GLM), Citation2023, Nelder & Wedderburn, Citation1972); in H2O’s AutoML function is the only parametric algorithm that includes many distributions – Poisson, binomial, gamma, etc. The main condition of each parametric algorithm is that input data must follow one of the distributions mentioned above in comparison to other machine learning algorithms that are non-parametric. For further interest please see the official documentation of the H2O library (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html).

All algorithms described above are tested within H2O’s AutoML function for their accuracy with the help of 10-fold cross-validation as default and then stacked together in the form of a stacked ensemble. The main advantage of the AutoML function is that it tunes automatically all necessary parameters of all implemented machine learning algorithms and the user is not required to know the internal structure of these algorithms. The basic parameter which is needed to be set before by the user is dedicated time in order to test all implemented machine learning algorithms. The default value is 1 hour as we kept it set for further research. The best algorithms or stacked ensembles models are then ranked based on their accuracy in the final leaderboard (Stacked Ensembles, Citation2023).

Classification process consists in the design of the best model in the form of a stacked ensemble () or a family of models of a single algorithm that has been tuned with the help of cross-validation before. The designed model is then used for final prediction of thematic classes for original time series datasets (). This step adds thematic information in the form of spatial distribution of changes from grasslands to arable lands for each soil block. Detected changes are stored as binary pixels in the resulting raster. The final step of the detection process is the accuracy assessment procedure. For more details see .

Figure 3. The principle of stacking individual machine learning models.

Figure 3. The principle of stacking individual machine learning models.

The important task is to define relevant variables (biological predictor) in classification process in order to maximise its overall accuracy. In the time series, the values of the selected biological predictors () differ in time; therefore, there is a need to reveal optimal time when changes from grasslands to arable lands can achieve maximum overall classification accuracy. H2O’s AutoML function brings a useful plot (heatmap) that shows time for each biological predictor () importance as the function of the best machine learning model tuned during cross-validation so that it has potential to define the most effective biological predictor and optimal time of using the individual predictors for mapping changes from grasslands to arable lands. It can be considered as a relevant form of feature selection that is necessary especially when stacked ensembles are used and there is research gap in this field (Zhang et al., Citation2022).

In the following step it is necessary to decide which algorithm will be used for classification itself. The decision is up to the processor based on the purpose of the study. Image classification can be supervised or unsupervised. In the case of unsupervised classification, the classification process is completely automatic without processor’s intervention. On the contrary to supervised classification there are training data required properly prepared by the processor before. Once this is done, the chosen classifier executes classification process where information in provided training data is assigned to image individual pixels or other spatial unit (soil block in this study). Final output of classification process is usually thematic raster with pixels or other spatial units containing information about thematic classes provided in training data. The final step of classification process is validation of its accuracy (Congalton, Citation1991; Congalton & Green, Citation2008).

Accuracy assessment process

Accuracy assessment in this study was separated into two steps. The first step was an evaluation of the performance of stacked ensembles of H2O’s AutoML function for mapping changes from grasslands to arable lands based on a cross-validation process. The second step was an evaluation of the thematic accuracy of predicted final change maps with the help of a standard binary confusion matrix (Congalton, Citation1991; Congalton & Green, Citation2008). We used overall accuracy, sensitivity and specificity metrics.

In order to evaluate the performance of each machine learning model after cross-validation we used ROC (Receiver Operational Characteristic) curve and AUC (Area under ROC curve). ROC curve is a special kind of plot for binary classifiers at all classification thresholds. Y axis plots values of sensitivity (recall or true positive rate – TPR) and X axis false positive rate – FPR:

(1) TPR=TPTP+FN(1)
(2) FPR=FPFP+TP(2)

True Positive Rate (TPR) can be calculated based on EquationEquation 1, where TP are True Positive Values and FN false negative values. EquationEquation 2 describes FPR (False Positive Rate) metric. FP are false positive values and TP true positives.

True positive rate and false negative rate metrics range between 0 and 1 values. For the best performance evaluation when the ROC curve is used, it is the main task to search algorithm that minimizes false positive rates and maximizes true positive rates under the given threshold that is typical of each classifier. In practice the closer the final ROC curve is to the upper left corner the better. AUC – area under ROC curve (Classification: ROC Curve and AUC, Citation2023; Receiver Operating Characteristic, Citationn.d.) measures the area under the ROC curve itself and X axis of the ROC plot. It scales between values 0 and 1. The closer AUC is to 1 the better. The performance of stacked ensembles and individual classifiers implemented in H2O’s AutoML function has been evaluated after 10-fold cross-validation.

In the second step, thematic accuracies of final land cover change maps were assessed. For that purpose sensitivity (eq.no 1), specificity (selectivity or true negative rate – TNR), and overall accuracy metrics derived from the standard confusion matrix were utilized.

(3) TNR=TNTN+FP(3)
(4) OA=TP+TNTP+FP+FN+TN(4)

TNR (True Negative rate – Equationequation (3)) includes TN – true negative values and FP – false positive values, where OA (Overall Accuracy – EquationEquation (4)) is combination of TP – true positives, TN – true negatives, FP – false positives, FN – false negatives and finally TN – true negative values.

Results

Machine learning models evaluation

Appendix 1 shows ROC curves for all three study areas. ROC curves plot the overall accuracies of tuned machine learning models during 10-fold cross-validation. The most accurate biological predictor for Study Area 1 was LAI with AUC 0.9880. The best algorithm was the Stacked Ensemble Model with Best of Family (the term, ”the best of family” represents models, that had the best overall accuracy after cross-validation – this will be mentioned further in the text). Next, the most accurate biological predictor was FCOVER with AUC 0.9855 with the winning algorithm in the form of a Stacked Ensemble Model Best of Family model. Finally, the third position captured FAPAR and NDVI biological predictors with the same value of AUC 0.9398. For the FAPAR biological predictor as the best machine learning algorithm served Distributed Random Forest and in the case of NDVI it was again the Stacked Ensemble Model Best of Family model. Study Area 2 brings minor different results. The most accurate model offered the original spectral band B8 of the Sentinel-2 satellite with the Stacked Ensemble Best of Family. The second most accurate machine learning algorithm was Distributed Random Forest for biological predictor NDWI with AUC 0.9744. In the following order, there was NDVI biological predictor with AUC 0.9733 and Stacked Ensemble Model Best of Family. Study Area 3 is a different case in comparison to the previous two study areas – there are cases where the XGBoost algorithm was the best model with AUC 0.9610 for NDVI biological predictor and biological predictors FAPAR (AUC 0.9610) and FCOVER (AUC 0.9606) where Gradient Boosting Machine Algorithm was the best classifier. For more details see Appendix 1.

As far as pure overall accuracies of tested machine learning models are concerned, for Study Area 1 the best model was a Stacked Ensemble with AUC 0.9880 as well as for Study Area 2 with AUC 0.9781 and finally for Study Area 3 it was Stacked Ensemble model with AUC 0.9664.

Biological predictors importance evaluation

Appendices 2–5 introduce heatmaps for Study Area 1 that determines the importance of all tested biological predictors (). It is necessary to take in mind that univariate time series datasets were tested. They were composed of Sentinel 2 satellite images of different acquisition dates. These acquisition dates represent biological predictors () connected to their acquisition dates (). For original spectral band B8 (Appendix 2) the most important acquisition date was 1 April 2019 based on Generalized Linear Model algorithm, the second place was captured on 16 April 2019. Similar results can be seen in the case of the spectral band B8A (Appendix 2) where the two most relevant acquisition dates were the same: the first took place on 16 April 2019 the second took place on 20 June 2017 and finally the third took place on 31 May 2016 with help of Generalized Linear Model Algorithm. A very similar situation is visible (Appendix 3) for biological predictor FAPAR where Generalized Linear Model Algorithm determined again the most relevant acquisition date, 13 October 2018; the second most important acquisition date was 16 April 2019. In the case of biological predictor FCOVER (Appendix 3) there was the most successful acquisition date was 28 August 2020, the second 20 June 2017 and the third 31 May 2018 (Appendix 4). As far as biological predictor LAI is concerned (Appendix 4) there is obvious result that the most relevant acquisition date is 28 September 2017 marked with higher amount of different machine learning algorithms such as Gradient Boosting Machines, Deep Forward Neural Network and XGBoost. The second most important relevance in the row represents 30 June 2019 and not far behind there is 16 April 2019. In the case of NDVI (Appendix 4) the best acquisition date still remains 16 April 2019 and other acquisition dates remain almost redundant. The biological predictor NDWI (Appendix 5) has the most important acquisition date 6 April 2018, the second one is 31 May 2018 and the third one 20 June 2017. NDWI2 biological predictor shows the most important acquisition date as 28 August 2020, the second one 31 May 2018 and the third one 20 June 2017, whereas other acquisition dates remain strongly redundant (Appendix 5).

In the case of Study Area 2 the most important acquisition date for original spectral band B8 (Appendix 6) is 28 August 2020 that was marked with the help of the Gradient Boosting Machine Algorithm (Appendix 6). The second most important date was marked 28 September 2018 and the third one 29 August 2019. Other acquisition dates in the time series remain strongly irrelevant − 12 September 2020, 8 August 2020, 13 October 2018, etc. In the case of B8A spectral band (Appendix 6) it was marked as the best date 28 August 2020, the second one 28 September 2018 and the third one 29 August 2018. FAPAR (Appendix 7) biological predictor shows slightly different results in comparison to original spectral bands B8 and B8A in terms of variable importance. The three most important acquisition dates 13 August 2020, 28 August 2020 and 4 August 2016 are marked more strongly with higher importance ranks with help of Gradient Boosting Machine Learning and XGBoost models including deep neural network. On the other hand, FCOVER biological predictor for Study Area 2 (Appendix 7) offers the most important acquisition dates 28 August 2020, 29 August 2019 and 12 September 2020. As for biological predictor LAI (Appendix 8) as the best date, it was marked 28 September 2018 with the following 28 September 2017 and finally 29 August 2018. Biological predictor NDVI (Appendix 8) was revealed as the most important acquisition date again 28 August 2020, 30 June 2019 and 24 July 2020. NDWI biological predictor (Appendix 9) revealed 28 September 2018 as the most important acquisition date with the following 28.September 2017 and 13 August 2020 28, whereas other dates remain strongly redundant − 29 August 2019 at the beginning and 20 April at the end. As far as NDWI2 (Appendix 9) biological predictor is concerned, the most relevant acquisition dates were marked as 28 August 2020, 13 August 2020 and 21 April 2016.

Study Area 3 shows different results (Appendix 10) in terms of the importance of acquisition dates for all biological predictors (). Original spectral band B8 (Appendix 7) shows as the most important acquisition date 1 April 2019, the second one 20 June 2017 and the third one 16 April 2019. Spectral band B8A was revealed as the best date and 16 April 2019, 20 June 2017, 31 May 2018 in the following order as well. For FAPAR biological predictor (Appendix 11) the most important acquisition date became 28 September 2017, the second place captured 29 August 2018 and finally the third position belonged to 18 September 2018. In terms of biological predictor LAI for Study Area 3 the best combination of acquisition dates (Appendix 12) belonged to 28 September 2018 as the best one with the following 28 September 2017 and at the end of 4 August 2016. The best three dates for biological predictor NDVI (Appendix 12) are 18 September 2018, 28 September 2018 and 30 August 2015. For the NDWI biological predictor (Appendix 13) there are three best dates: 18 September 2018, 13 October 2018, 29 August 2018, and there are lots of acquisition dates strongly redundant. NDWI2 biological predictor of Study Area 3 (Appendix 13) offered as the best three acquisition dates 28 September 2018, 4 August 2016 and 29 August 2018.

Thematic accuracy assessment

The thematic accuracy of all maps was evaluated with the help of sensitivity, specificity, and overall accuracy – see Chapter 3.3.4. Sensitivity and specificity results for Study Area 1 are located in . It is visible that values of specificity for all tested predictors () are almost at the same level, but sensitivity values bring more obvious results. The highest values of sensitivity are visible in the case of biological predictor FAPAR, then followed by FCOVER and NDWI biological predictors. Sensitivities of FAPAR reached approximately 0.75, FCOVER 0.7, and NDWI 0.69. Study Area 2 () shows similar results in terms of specificity in comparison to the specificity values of Study Area 1. On the other hand values of specificity are generally lower. The highest sensitivity approximately 0.68 appears in the case of biological predictor FCOVER followed by LAI and NDVI – approximately 0.58. As for Study Area 3 () specificity values are almost identical to previous study areas but values of sensitivities for all tested biological predictors () are very low in difference to Study Areas 1 and 2. The highest sensitivity approximately 0.31 was reached in the case of the NDWI2 biological predictor, the second best value was approximately 0.3 for the original spectral band B8A and the third best predictors were NDWI and NDVI with almost identical values of sensitivity of 0.21. In general it is obvious that values of sensitivity are low in comparison to Study Areas 1 and 2. For example in the case of biological predictors FCOVER and LAI were values of sensitivities approximately around 0.05 which is extremely low. It means that FCOVER and LAI biological predictors are redundant in the case of Study Area 3 and they should not be considered relevant for further investigations for this study region.

Figure 4. Sensitivities and specificities for all tested biological predictors () in all three study areas.

a) Study Area 1, b) Study Area 2, c) Study Area 3.
Figure 4. Sensitivities and specificities for all tested biological predictors (Table 2) in all three study areas.

When the results of overall accuracies are compared for all three study areas and all tested biological predictors () it is visible that the highest overall accuracy was captured in the case of biological predictors FAPAR, FCOVER, and LAI for Study Area 1. The overall accuracy in the case of biological predictor FAPAR reached approximately 96.6% in the case of FCOVER it was 96% and finally for LAI, it was 96.1% as well as for original spectral band B8. The best three biological predictors in terms of overall accuracy for Study Area 2 are FCOVER, NDWI, and NDWI2. The overall accuracy for biological predictor FCOVER was approximately 96.25% for biological predictors NDWI and NDWI2 it was 96.2%. As far as the best result in terms of overall accuracy for Study Area 3 is concerned, the original spectral band B8A was the best at 95.5%. The second result in the row was reached in the case of biological predictors NDWI and original spectral band B8 together − 95.3%. The lowest value of overall accuracy for Study Area 3 ended for biological predictor NDWI2 at 94.5%. It is visible that Study Area 3 had the lowest sensitivities and overall accuracies for all tested biological predictors in comparison to Study Areas 1 and 2.

Figure 5. Comparison of overall accuracies for all tested biological predictors () for all three study areas.

Figure 5. Comparison of overall accuracies for all tested biological predictors (Table 2) for all three study areas.

shows the spatial distribution of changed soil blocks for the FAPAR biological predictor classified with the H2O AutoML function that serves as an example. It can be seen there are a high amount of changed areas in comparison to not changed ones. This example of a thematic map represents a situation of changes at the end of the investigated period.

Figure 6. Spatial distribution of changed soil blocks for FAPAR biological predictor of study Area 1.

Figure 6. Spatial distribution of changed soil blocks for FAPAR biological predictor of study Area 1.

Discussion

The aim of this study was the evaluation of the machine learning algorithm of H2O for the purpose of detection of grassland to arable land changes. The variable importance of algorithm/parameters was tested with time image acquisition of satellite data in the time series for each individual spectral band or biological predictor. For this purpose, we tested several machine learning algorithms implemented in the H2O library and its AutoMl function (LeDell & Poirier, Citation2020). We tested all presented machine learning algorithms with ROC curves together with AUC. It was sufficiently demonstrated in this study that stacked ensembles and decision tree-based classifiers implemented in the AutoML function of the H2O library provide very good results in overall accuracy. Tested algorithms such as Distributed Random Forest and Extremely Randomized Trees offer effective choices over the original Random Forest algorithm, which has been tested in several studies (e.g. Behnamian et al., Citation2017; Stumpf & Kerle, Citation2011 or Wang et al., Citation2016). The AutoML function of H2O library implements an end-user friendly and effective tool with a heatmap that evaluates the importance of all variables and all algorithms which are part AutoML function. H2O library provides for users a useful overview of important and redundant variables that were part of the classification process. In most cases stacked ensembles showed the best accuracy in the detection of changes from grasslands to arable lands with highest overall accuracy 96.6% and high values of area under the ROC curve (0.94–0.98). This is an important achievement in the accuracy aspect. If it is compared with similar studies, e.g. Esch et al. (Citation2014) mapped crops and grasslands with decision tree classifier with 86% accuracy or Xu et al. (Citation2021) evaluated changes between 1984 and 2016 in selected areas in Africa using the random forest classifier and Landsat data and achieved accuracy oscillating between 56 and 94%. Klouček et al. (Citation2018) achieved the best change map with accuracy 89.80% and Kappa 0.63. In the study Šandera and Štych, (Citation2019), the Extreme Gradient Boosting algorithm was used and reached the highest overall accuracy of 89.51% (for comparison AdaBoost with the Random Forest as a weak learner reached 87.78% in this study). The designed methods of this study have a high potential for further research and testing in different case studies (Divina et al., Citation2018; Healey et al., Citation2018; Y; Wang et al., Citation2020). An implementation of the stacked ensembles in different fields of interest should be useful. There are only a few studies that utilized the H2O library and its AutoML function in mapping above-ground biomass using Sentinel-2 time series (Costache et al., Citation2022; Naik et al., Citation2022). For purpose of the wider testing and validation of our methods, the developed algorithm has been published as open source in Github repository: (https://github.com/jsandera/Grassland-to-Cropland-Change-Detection-H2O-library).

In terms of effectiveness/computational speed of the used methods the used workflow took approximately 10 minutes to obtain the final results on 12 core Ryzen 7900× processor and 64 GB ram. This is similar to the time demands of studies using the standard Random Forest classifier (Mellor et al., Citation2013; Millard & Richardson, Citation2015). H2O library has been designed to work with big data in the server environments therefore it offers parallel implementation of computationally heavy tasks (cross-validation, model predictions) on standard desktop computers. From this point of view it is recommended to use this or similar libraries that support parallel implementation in desktop environment. H2O library is available in R, Python or Java programming languages and it offers a usage of advanced machine learning techniques to a wide range of end-users in the field of the machine learning area without deeper knowledge of the internal parameters of each algorithm.

This study proved that it is extremely important to do a feature/data selection in general when stacked ensembles are utilized (Zhang et al., Citation2022). There is a research gap in this area (see Georganos et al., Citation2018; Kiala et al., Citation2019; Šandera and Štych, Citation2019; Šandera and Štych, Citation2020). Determination of proper time window for satellite image acquisition in mapping the changes from grasslands to arable lands correlates with spectral behaviour of crop and grassland species as well as environmental conditions (temperature and altitude). The selected study areas were selected in order to capture these differences with connection to H2O’s algorithms’ sensitivity to capture these variabilities and getting relevant information about them. It must be highlighted that it provided basic information in the form of embedded plotting function of H2O library that shows interactively the most appropriate date to map changes from grasslands to arable lands. Spectral behaviour of different crops and grass species might be similar in the peak of the vegetation season. On the other hand, early spring or late summer provide a proper time window to map changes from grasslands to arable lands. Grasslands are still green and crop species are not in their full growth (or are harvested – late summer). From the point of view of spectral behaviour there should be no overlaps between grasslands and croplands (in this time almost bare lands). The heatmap plot from H2O library is a useful tool to capture these spectral differences as the most important variables. For all three study areas, there were a lot of significant acquisition dates in the late summer and early autumn such as 28 August, 28 September, 13 October; there were important acquisition dates in early spring as well − 16 April. Early spring offers an alternative as the best acquisition date in order to map changes from grasslands to arable lands because many cereal species are in early phenological stages and spectral conditions of bare lands are similar to ones during late summertime. This proved the recommendations of previous studies to concentrate on time where arable lands are almost bare lands (Esch et al., Citation2014).

Concerning the selection and evaluation of the spectral bands of Sentinel-2 data and derived biological predictors, the heatmap plots that are part of H2O’s library AutoML function helped to solve this task. In case all spectral bands of Sentinel-2 were included it may have provided redundant information due to a strong correlation between these spectral bands. Recommended approach in remote sensing in general is to get rid of strongly correlated spectral bands or vegetation indices that may contain redundant information and the only result is time increase in terms of computations without increasing accuracy. It was decided to include only spectral bands of Sentinel-2 that reflect spectral behaviour of vegetation in general.

From the point of view of tested original spectral bands and biological predictors, the highest overall accuracy reached biological predictors FAPAR, FCOVER, and LAI. LAI reached the highest AUC 0.9880 for study area 1 with stacked ensemble as the best classification algorithm and thematic accuracy higher than 96% () in the cases of study area 1 and 2. FAPAR reached almost 97% overall accuracy in the case of study area 1, on the other hand the lowest accuracy for study area 3–94.6% (). FCOVER offers the best accuracy for study area 1–96.4% with combination of Distributed Random Forest Algorithm, for study area 2 it is stacked ensemble model and for study area 3 the best classifier was gradient boosting machine algorithm (Appendix 1). On the other hand, the most balanced results in term of thematic accuracy () provides original spectral band B8 for all three study areas with stacked ensemble models. Overall accuracies are not that high in comparison to other tested biological predictors. But according to the fact that biological predictors are required to be calculated from original satellite data before, original spectral band B8 provides reliable balanced results and saves additional computations and time required to do so. Therefore it can be recommended to utilise original spectral band B8 of Sentinel-2 satellite to map changes from grasslands to arable lands with combination of stacked ensembles from H2O library. The predictors NDWI and NDWI2 documented high importance in the case of algorithms of Distributed Random Forest and Extremely Randomized Trees. A novel contribution of this study can be considered in the above-mentioned results bringing useful information based on evaluation of individual predictors and algorithms. However, for more conclusive details, further research is needed. It should be tested a relationship between results provided by the heatmap plots of H2O library and spectral behaviour of all tested biological predictors in the different case studies with combination of individual crop species.

It is necessary to mention that our study investigated only selected case studies in the Czech Republic. The best results with highest accuracies were obtained in Study Area 1 followed by Study Areas 2 and 3. The worst results of the Areas 2 and 3 are probably caused by heterogenic local conditions of these areas. It would be recommended for further research to study in detail a relationship of values of predictors with characters of conditions of soil blocks, e.g. soil quality, precipitation. It is obvious that stacked ensembles are strongly sensitive to local climate conditions. Therefore there is a need to create many local machine learning models that will cope with these microclimatic conditions and stack them together. This can be ensured by deep learning approaches (Cao et al., Citation2021; Korzh et al., Citation2018; Prakash et al., Citation2022). Next recommendation is to include detailed information about crops. It would be useful information to improve a finding and explain the importance of the predictors in the time and validate the results. Such a study should concentrate on spectral behaviour of individual crops and grass species in soil blocks.

Based on our results Sentinel-2 data seems to be a very useful data source for detection of change of grassland to arable land. Both spectral or time resolutions are advanced and suitable. However, there was missing data in relevant periods during our investigation, so we utilised univariate multitemporal datasets that have been interpolated with linear interpolation method. Although we implemented this method only for missing images and not a daily output product, the interpolation method might lead to potential influences in spectral character in time series. There are several alternatives that could be implemented and tested in terms of interpolation and time series smoothing – such as the Savitzky-Golay algorithm (Wikipedia contributors, Citation2023). Further studies should be concentrated on testing the interpolation and smoothing methods. Implementation of a wider list of in-situ and auxiliary data would be a relevant issue for a continuation of this kind of study. The LPIS database which was used as the reference data source in this study. Information in this database is collected from the owners of the parcels and accuracy is based on the responsibility of the owners. Although we checked all the parcels using independent data sources (aerial images), the validation of input data based on independent data sources would be suitable. Another shortcoming of the LPIS database is only in binary information (change x no change). Information about crops and grass species is not provided for free. From this point of view, in-situ research of the crops and species in time with combination of this information from different sources (e.g. LUCAS) are recommended steps for future research.

Conclusion

This study deals with implementation and evaluation of the machine learning algorithm of H2O in the detection of grassland to arable land changes. The importance of algorithms, time acquisition of input data and individual biological predictors are tested and evaluated. For this purpose, advanced machine learning methods for mapping changes from grasslands to arable lands with help of H2O library are designed and assessed. The AutoML function of the H2O library was used. The designed and used methods are available as open source algorithm in Github repository: https://github.com/jsandera/Grassland-to-Cropland-Change-Detection-H2O-library

Implemented stacked ensembles composed of machine learning algorithms in H2O AutoML function proved their efficiency in mapping the changes from grasslands to arable lands. The most relevant results show high overall accuracy from 93.5 to 96.6% and high values of area under the ROC curve (0.94–0.98). Stacked ensembles appear to be the most accurate machine learning models. The importance of several biological predictors has been tested (FAPAR, FCOVER, LAI, NDVI, etc.) based on the heatmap, which is part of AutoML function of H2O library. The received information from this heatmap is extremely important to define the most important acquisition dates of time series datasets of each tested biological predictor as well as the most relevant algorithms. Images from early spring or late summer provide a suitable time window. From the point of view of tested predictors, the highest overall accuracy reached biological predictors FAPAR, FCOVER, and LAI. Tested algorithms such as Distributed Random Forest and Extremely Randomized Trees offer effective tools comparable with the original Random Forest algorithm. The evaluation and definition of the most efficient algorithms of the H2O library with the original result derived from time series analysis (importance of the predictors and time window) are the most important research contribution of this study. The H2O library with the machine learning algorithms and its AutoML function are very strong tools for the change detection with very high perspectives in the future research. Testing these methods in different case studies or to evaluate and interpret results with the detailed information about local conditions and type of crops are the challenges that should be investigated in the following research.

Author contributions

Jiří Šandera conceptualised and suggested the methodology, programmed the software, executed the main investigation and wrote the original draft. Přemysl Štych contributed and supervised the methods and text and provided the main funding. All the authors have read and agreed to the published version of the manuscript.

Supplemental material

Supplemental Material

Download JPEG Image (542.2 KB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Supplemental Material

Download JPEG Image (1.1 MB)

Acknowledgments

We would like to thank to the the European Union’s Caroline Herschel Framework. We would like to thank the anonymous reviewers for their substantial help with improving the manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/22797254.2023.2294127

Additional information

Funding

This study was supported by the European Union’s Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake).

References

  • Arrouays, D., Deslais, W., & Badeau, V. (2001). The carbon content of topsoil and its geographical distribution in France. Soil Use and Management, 17(1), 7–18. https://doi.org/10.1111/j.1475-2743.2001.tb00002.x
  • Asadollah, S. B. H. S., Sharafati, A., & Shahid, S. (2022). Application of ensemble machine learning model in downscaling and projecting climate variables over different climate regions in Iran. Environmental Science and Pollution Research, 29, 1–20. https://doi.org/10.1007/s11356-021-16964-y
  • Asgari, M., Yang, W., & Farnaghi, M. (2022). Spatiotemporal data partitioning for distributed random forest algorithm: Air quality prediction using imbalanced big spatiotemporal data on spark distributed framework. Environmental Technology & Innovation, 27, 102776. https://doi.org/10.1016/j.eti.2022.102776
  • Bannari, A., Morin, D., Bonn, F., & Huete, A. R. (1995). A review of vegetation indices. Remote Sensing Reviews, 13(1–2), 95–120. https://doi.org/10.1080/02757259509532298
  • Behnamian, A., Millard, K., Banks, S. N., White, L., Richardson, M., & Pasher, J. 2017. A systematic approach for variable selection with random forests: Achieving stable variable importance values. Ieeexplore.Ieee.org. Retrieved June 15, 2021, Ieeexplore.Ieee.org https://ieeexplore.ieee.org/abstract/document/8038868/
  • Belgiu, M., & Csillik, O. (2018). Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sensing of Environment, 204, 509–523. https://doi.org/10.1016/j.rse.2017.10.005
  • Belgiu, M., & Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Bui, Q.-T., Chou, T.-Y., Hoang, T.-V., Fang, Y.-M., Mu, C.-Y., Huang, P.-H., Pham, V.-D., Nguyen, Q.-H., Anh, D. T. N., Pham, V.-M., & Meadows, M. E. (2021). Gradient boosting machine and object-based CNN for land cover classification. Remote Sensing, 13(14), 2709. https://doi.org/10.3390/rs13142709
  • Campbell, S. G. (2020). The researcher’s complete guide to Leaf Area Index (LAI). https://www.metergroup.com/environment/articles/lp80-pain-free-leaf-area-index-lai/
  • Cao, D., Xing, H., Wong, M. S., Kwan, M. P., Xing, H., & Meng, Y. (2021). A stacking ensemble deep learning model for building extraction from Remote Sensing images. Remote Sensing, Vol. 13, (19), 3898. https://doi.org/10.3390/RS13193898
  • Ceballos, G., Davidson, A., List, R., Pacheco, J., Manzano-Fischer, P., Santos-Barrera, G., Cruzado, J., & Hansen, D. M. (2010). Rapid decline of a grassland system and its ecological and conservation implications. PLoS ONE, 5(1), e8562. https://doi.org/10.1371/journal.pone.0008562
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 August 2016, 785–794. https://doi.org/10.1145/2939672.2939785
  • Chen, T., He, T., Benesty, M., Khotilovich, V., & Tang, Y. (2017). XGboost: Extreme gradient boosting. https://cran.r-project.org/package=xgboost
  • Chen, J., Liu, H., Chen, J., & Peng, S. (2018). Trend forecast based approach for cropland change detection using Lansat-derived time-series metrics. International Journal of Remote Sensing, 39(21), 7587–7606. https://doi.org/10.1080/01431161.2018.1475774
  • Classification: ROC Curve and AUC. Machine learning | Google developers. Retrieved May 16, 2023, from https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
  • Congalton, R. G. (1991). A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37(1), 35–46. https://doi.org/10.1016/0034-4257(91)90048-B
  • Congalton, R. G., & Green, K. (2008). Assessing the accuracy of remotely sensed data: Principles and practices (2nd ed.). Assessing the Accuracy of Remotely Sensed Data. https://doi.org/10.1201/9781420055139
  • Corresp, R. M., Corresp, E. F., Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. https://doi.org/10.7287/PEERJ.PREPRINTS.2911V1
  • Costache, R., Trung Tin, T., Arabameri, A., Crăciun, A., Ajin, R. S., Costache, I., Reza, A., Abba, S. I., Sahana, M., Avand, M., & Thai Pham, B. (2022). Flash-flood hazard using deep learning based on H2O R package and fuzzy-multicriteria decision-making analysis. Canadian Journal of Fisheries and Aquatic Sciences, 609, 127747. https://doi.org/10.1016/J.JHYDROL.2022.127747
  • Cross-validation (statistics). (2023, April 23). https://En.Wikipedia.Org/Wiki/Cross-Validation_(Statistics)#cite_note-McLachlan-16
  • Csillik, O., & Belgiu, M. (2017). Cropland mapping from Sentinel-2 time series data using object-based image analysis. Proceedings of the 20th AGILE International Conference on Geographic Information Science Societal Geo-Innovation Celebrating, Wageningen, The Netherlands, (Vol. 9, pp. 20).
  • Deep Learning (Neural Networks). (2023, May 16). H2O 3.40.0.4 documentation. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/deep-learning.html
  • Distributed Random Forest (DRF). (2023, May 16). H2O 3.40.0.4 documentation. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/drf.html
  • Divina, F., Gilson, A., Goméz-Vela, F., García Torres, M., & Torres, J. F. (2018). Stacking ensemble learning for short-term electricity consumption forecasting. Energies, 11(4), 949. https://doi.org/10.3390/en11040949
  • Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., & Bargellini, P. (2012). Sentinel-2: ESA’s Optical high-resolution Mission for GMES Operational services. Remote Sensing of Environment, 120, 25–36. https://doi.org/10.1016/j.rse.2011.11.026
  • Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., Bargellini, P., & others. (2012). Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sensing of Environment, 120, 25–36. https://doi.org/10.1016/j.rse.2011.11.026
  • Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
  • Esch, T., Metz, A., Marconcini, M., & Keil, M. (2014). Combined use of multi-seasonal high and medium resolution satellite imagery for parcel-related mapping of cropland and grassland. International Journal of Applied Earth Observation and Geoinformation, 28, 230–237. https://doi.org/10.1016/j.jag.2013.12.007
  • European Space Agency. (2020). SNAP - ESA Sentinel application platform.
  • Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. ICML, 96, 148–156.
  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  • Gao, B. C. (1996). NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sensing of Environment, 58(3), 257–266. https://doi.org/10.1016/S0034-4257(96)00067-3
  • Generalized Linear Model. (2023, May 16). H2O 3.40.0.4 documentation. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html#glm
  • Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., Kalogirou, S., & Wolff, E. (2018). Less is more: Optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GIScience & Remote Sensing, 55(2), 221–242. https://doi.org/10.1080/15481603.2017.1408892
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
  • Gitelson, A. A., Kaufman, Y. J., Stark, R., & Rundquist, D. (2002). Novel algorithms for remote estimation of vegetation fraction. Remote Sensing of Environment, 80(1), 76–87. https://doi.org/10.1016/S0034-4257(01)00289-9
  • Gradient Boosting Machine. (2023, May 16). H2O 3.40.0.4 documentation. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html
  • Healey, S. P., Cohen, W. B., Yang, Z., Kenneth Brewer, C., Brooks, E. B., Gorelick, N., Hernandez, A. J., Huang, C., Joseph Hughes, M., Kennedy, R. E., Loveland, T. R., Moisen, G. G., Schroeder, T. A., Stehman, S. V., Vogelmann, J. E., Woodcock, C. E., Yang, L., & Zhu, Z. (2018). Mapping forest change using stacked generalization: An ensemble approach. Remote Sensing of Environment, 204, 717–728. https://doi.org/10.1016/J.RSE.2017.09.029
  • Hijmans, R. J. (2020). Raster: Geographic data analysis and modeling. https://cran.r-project.org/package=raster
  • Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., & Ferreira, L. G. (2002). Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83(1–2), 195–213. https://doi.org/10.1016/S0034-4257(02)00096-2
  • Just, A. C., Liu, Y., Sorek-Hamer, M., Rush, J., Dorman, M., Chatfield, R., Wang, Y., Lyapustin, A., & Kloog, I. (2019). Gradient boosting machine learning to improve satellite-derived column water vapor measurement error. Atmospheric Measurement Techniques Discussions 2019, 13(9), 1–22. https://doi.org/10.5194/amt-13-4669-2020
  • Kiala, Z., Mutanga, O., Odindi, J., & Peerbhay, K. (2019). Feature selection on Sentinel-2 multispectral imagery for mapping a landscape infested by parthenium weed. Remote Sensing, 11(16), 1892. https://doi.org/10.3390/RS11161892
  • Klouček, T., Moravec, D., Komárek, J., Lagner, O., & Štych, P. (2018). Selecting appropriate variables for detecting grassland to cropland changes using high resolution satellite data. PeerJ, 6, e5487. https://doi.org/10.7717/peerj.5487
  • Korzh, O., Cook, G., Andersen, T., & Serra, E. (2018). Stacking approach for CNN transfer learning ensemble for remote sensing imagery. 2017 Intelligent Systems Conference, IntelliSys 2017, (pp. 599–608). https://doi.org/10.1109/INTELLISYS.2017.8324356
  • LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M., & Malohlava, M. (2020). H2o: R interface for the “H2O” scalable machine learning platform. https://github.com/h2oai/h2o-3
  • LeDell, E., & Poirier, S. (2020, July). H2o automl: Scalable automatic machine learning. Proceedings of the AutoML Workshop at ICML (Vol. 2020) ICML. https://sites.google.com/view/automl2020/home
  • Leyk, S., & Zimmermann, N. E. (2007). Improving land change detection based on uncertain survey maps using fuzzy sets. Landscape Ecology, 22(2), 257–272. https://doi.org/10.1007/s10980-006-9021-2
  • Lobell, D. B., & Field, C. B. (2007). Global scale climate–crop yield relationships and the impacts of recent warming. Environmental Research Letters, 2(1), 14002. https://doi.org/10.1088/1748-9326/2/1/014002
  • Louis, J., Gascon, F., Niezette, M., Müller-Wilm, U., & Richter, R. (2013). Sentinel-2 level-2A prototype processor: Architecture, algorithms and first results. https://www.researchgate.net/publication/273002025
  • Luo, M., Wang, Y., Xie, Y., Zhou, L., Qiao, J., Qiu, S., & Sun, Y. (2021). Combination of feature selection and catboost for prediction: The first application to the estimation of aboveground biomass. Forests, 12(2), 216. https://doi.org/10.3390/f12020216
  • Lurette, A., Aubron, C., & Moulin, C.-H. (2013). A simple model to assess the sensitivity of grassland dairy systems to scenarios of seasonal biomass production variability. Computers and Electronics in Agriculture, 93, 27–36. https://doi.org/10.1016/j.compag.2013.01.008
  • Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for sentinel-2. Image and Signal Processing for Remote Sensing XXIII, 10427, 1042704. https://doi.org/10.1117/12.2278218
  • Mardian, J., Berg, A., & Daneshfar, B. (2021). Evaluating the temporal accuracy of grassland to cropland change detection using multitemporal image analysis. Remote Sensing of Environment, 255, 112292 https://www.sciencedirect.com/science/article/pii/S0034425721000109?casa_token=hwALRroND6oAAAAA:UYzSvxKkhggNRuNv4yYFvztJq-lng8ruUcB7LwfEme7wDT12WlbhRJ10qVEubDyqZfFtaJmUWn8
  • McFeeters, S. K. (1996). The use of the normalized difference water index (NDWI) in the delineation of open water features. International Journal of Remote Sensing, 17(7), 1425–1432. https://doi.org/10.1080/01431169608948714
  • Mellor, A., Haywood, A., Stone, C., & Jones, S. (2013). Remote Sensing the performance of random forests in an operational setting for large area sclerophyll forest classification. Remote Sensing, 5(6), 2838–2856. https://doi.org/10.3390/rs5062838
  • Millard, K., & Richardson, M. (2015). On the importance of training data sample selection in Random Forest image classification: A case study in peatland ecosystem mapping. Remote Sensing, 7(7), 8489–8515. https://doi.org/10.3390/rs70708489
  • Morisette, J. T., & Khorram, S. (1997). An introduction to using generalized linear models to enhance satellite-based change detection. IGARSS’97. 1997 IEEE International Geoscience and Remote Sensing Symposium Proceedings. Remote Sensing-A Scientific Vision for Sustainable Development, 4, (pp. 1769–1771). IEEE.
  • Naik, P., Dalponte, M., & Bruzzone, L. (2022). Automated machine learning driven stacked ensemble modelling for forest aboveground biomass prediction using multitemporal Sentinel-2 data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. IEEE Geoscience & Remote Sensing Society. https://ieeexplore.ieee.org/xpl/aboutJournal.jsp?punumber=4609443
  • Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society Series A (General), 135(3), 370–384. https://doi.org/10.2307/2344614
  • Pakzad, K., Growe, S., Heipke, C., & Liedtke, C.-E. (2001). Multitemporale Luftbildinterpretation: Strategie und Anwendung. KI, 15(4), 10–16.
  • Prakash, J. A., Ravi, V., Sowmya, V., & Soman, K. P. (2022). Stacked ensemble learning based on deep convolutional neural networks for pediatric pneumonia diagnosis using chest X-ray images. Neural Computing and Applications, 35(11), 8259–8279. https://doi.org/10.1007/s00521-022-08099-z
  • R Core Team. (2020). R: A language and environment for statistical computing. https://www.r-project.org/
  • Receiver operating characteristic. (n.d.). Wikipedia. Retrieved May 16, 2023, from https://en.wikipedia.org/wiki/Receiver_operating_characteristic
  • Ribeiro, M. H. D. M., & dos Santos Coelho, L. (2020a). Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Applied Soft Computing, 86, 105837. https://doi.org/10.1016/j.asoc.2019.105837
  • Rouse, J. W., Haas, R. H., Schell, J. A., & Deering, D. W. (1974). Monitoring vegetation systems in the Great Plains with ERTS. https://www.google.com/books?hl=cs&lr=&id=bn_xAAAAMAAJ&oi=fnd&pg=PA309&dq=Rouse+Jr,+J.,+Haas,+R.+H.,+Schell,+J.+A.,+%26+Deering,+D.+W.+(1974).+Monitoring+vegetation+systems+in+the+Great+Plains+with+ERTS&ots=YTLtRFzPPL&sig=JIetyHbLrvzvD7K4S7y-5RJ3vVQ
  • Šandera, J., & Štych, P. (2019). Change detection work-flow for mapping changes from arable lands to permanent grasslands with advanced boosting methods. Geodetski Vestnik, 63(3), 379–394. https://doi.org/10.15292/geodetski-vestnik.2019.03.379-394
  • Šandera, J., & Štych, P. (2020). Selecting relevant biological variables derived from sentinel-2 data for mapping changes from grassland to arable land using random forest classifier. Land, 9(11), 1–20. https://doi.org/10.3390/land9110420
  • Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick & B. Yu (Eds.), Nonlinear estimation and classification (pp. 149–171). Springer.
  • Schmidt, S., Alewell, C., & Meusburger, K. (2018). Mapping spatio-temporal dynamics of the cover and management factor (C-factor) for grasslands in Switzerland. Remote Sensing of Environment, 211, 89–104. https://doi.org/10.1016/j.rse.2018.04.008
  • Sellers, P. J. (1985). Canopy reflectance, photosynthesis and transpiration. International Journal of Remote Sensing, 6(8), 1335–1372. https://doi.org/10.1080/01431168508948283
  • Sklenička, P. (2002). Základy krajinného plánován{\’\i}. Naděžda Skleničková.
  • Soons, M. B., Messelink, J. H., Jongejans, E., & Heil, G. W. (2005). Habitat fragmentation reduces grassland connectivity for both short-distance and long-distance wind-dispersed forbs. Journal of Ecology, 93(6), 1214–1225. https://doi.org/10.1111/j.1365-2745.2005.01064.x
  • Stacked Ensembles. (2023, May 16). H2O 3.40.0.4 documentation. https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html
  • Stumpf, A., & Kerle, N. (2011). Object-oriented mapping of landslides using random forests. Remote Sensing of Environment, 115(10), 2564–2577. https://doi.org/10.1016/j.rse.2011.05.013
  • Szostak, M., & Hawryło Pawełand Piela, D. (2018). Using of Sentinel-2 images for automation of the forest succession detection. European Journal of Remote Sensing, 51(1), 142–149. https://doi.org/10.1080/22797254.2017.1412272
  • Tan, X., & Xue, Z. (2022). Spectral-spatial multi-layer perceptron network for hyperspectral image land cover classification. European Journal of Remote Sensing, 55(1), 409–419. https://doi.org/10.1080/22797254.2022.2087540
  • Toland, J., Jones, W., Eldridge, J., Thorpe, E., & O’hara, E. (2008). LIFE and Europe’s grasslands: Restoring a forgotten habitat. Office for Official Publications of the European Communities.
  • Tripathi, A., Tiwari, R. K., & Tiwari, S. P. (2022). A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation. International Journal of Applied Earth Observation and Geoinformation, 113, 102959. https://doi.org/10.1016/j.jag.2022.102959
  • Trojáček, P. (2002). New land parcel identification system for agricultural subsidies in the Czech Republic.
  • Tucker, C. J. (1979). Red and photographic infrared. Retrieved June 14, 2021, from https://ntrs.nasa.gov/api/citations/19780024582/downloads/19780024582.pdf
  • Tucker, C. J., Newcomb, W. W., Los, S. O., & Prince, S. D. (1991). Mean and inter-year variation of growing-season normalized difference vegetation index for the Sahel 1981-1989. International Journal of Remote Sensing, 12(6), 1133–1135. https://doi.org/10.1080/01431169108929717
  • Vertès, F., Hatch, D., Velthof, G., Taube, F., Laurent, F., Loiseau, P., & Recous, S. (2007). Short-term and cumulative effects of grassland cultivation on nitrogen and carbon cycling in ley-arable rotations. Permanent and temporary grassland: plant, environment and economy. Proceedings of the 14th Symposium of the European Grassland Federation, Ghent, Belgium, 3-5 September 2007 (pp. 227–246). Belgian Society for Grassland and Forage Crops. https://www.cabdirect.org/cabdirect/abstract/20083018734
  • Wang, Q., Li, J., Jin, T., Chang, X., Zhu, Y., Li, Y., Sun, J., & Li, D. (2020). Remote Sensing comparative analysis of Landsat-8, Sentinel-2, and GF-1 data for retrieving soil moisture over wheat farmlands. Remote Sensing, 12(17), 2708. https://doi.org/10.3390/rs12172708
  • Wang, X., Tan, L., & Fan, J. (2023). Performance evaluation of mangrove species classification based on multi-source Remote Sensing data using extremely randomized trees in Fucheng Town, Leizhou city, Guangdong Province. Remote Sensing, 15(5), 1386. https://doi.org/10.3390/rs15051386
  • Wang, H., Yang, F., & Luo, Z. (2016). An experimental study of the intrinsic stability of random forest variable importance measures. BMC Bioinformatics, 17(1). https://doi.org/10.1186/s12859-016-0900-5
  • Wikipedia contributors. (2023, December 6). Savitzky–Golay filter. Wikipedia. https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
  • Xu, X., Lin, H., Liu, Z., Ye, Z., Li, X., & Long, J. (2021). A combined strategy of improved variable selection and ensemble algorithm to map the growing stem volume of planted coniferous Forest. Remote Sensing, 13(22), 4631. https://doi.org/10.3390/RS13224631
  • Xu, Y., Yu, L., Zhao, F. R., Cai, X., Zhao, J., Lu, H., & Gong, P. (n.d.). Tracking annual cropland changes from 1984 to 2016 using time-series Landsat images with a change-detection and post-classification approach. https://www.sciencedirect.com/science/article/pii/S003442571830419X?casa_token=Nk89BLvw2CwAAAAA:4IPz3i3UUPFdCIk0crATetyHrl55D2MHBCDnQ8qaovNyN2Q1BDTtYaG6zURDtDW3OA9mc3CpU1E
  • Zafari, A., Zurita-Milla, R., & Izquierdo-Verdiguier, E. (2019). Land cover classification using extremely randomized trees: A kernel perspective. IEEE Geoscience and Remote Sensing Letters, 17(10), 1702–1706. https://doi.org/10.1109/LGRS.2019.2953778
  • Zhang, S., Chen, H., Fu, Y., Niu, H., Yang, Y., & Zhang, B. (2019). Fractional vegetation cover estimation of different vegetation types in the Qaidam Basin. Sustainability, 11(3), 864. https://doi.org/10.3390/su11030864
  • Zhang, Y., Liu, J., & Shen, W. (2022). A review of ensemble learning algorithms used in remote sensing applications. Applied Sciences, 12(17), 8654. https://doi.org/10.3390/app12178654