543
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Fractional crop-planting area projection by integrating geographic grid data and agricultural statistics based on random forest regression

ORCID Icon, ORCID Icon, , &
Pages 4446-4470 | Received 06 Jul 2023, Accepted 16 Oct 2023, Published online: 24 Oct 2023

ABSTRACT

Accurate fractional crop-planting area (FCPA) mapping is a challenging task as it requires leveraging the advantages of geographic data in detailed spatial expression and agricultural statistics in the description of crop types and quantitative characteristics. We present a robust method to disaggregate the agricultural statistics within each county unit to 1-km scale grid by combining particle swarm optimization (PSO)-based feature selection with Random Forest (RF) regression, and an iterative area-gapallocation(IAGA) method is implemented to reconcile the discrepancies between the disaggregating results and statistics. The agriculture in Gansu Province, China, is characterized by complex heterogeneous smallholder farming landscapes. We tested this methodology in Gansu and explored the synergistic estimation of FCPA for six types of crops (i.e. wheat, maize, oil-bearing, vegetable,orchards, and other crops) in 2010. The results showed that the derived FCPA maps matched well with the statistics in terms of quantity, while also providing spatial details. The quantitative evaluation results indicated that the derived FCPA had good accuracy,with a higher R2 above 0.97, a lower RMSE below 1%, and an absolute error between 1.53-5.24%. The proposed methodology provides valuable insights for practical large-scale FCPA mapping at a high spatial resolution in a cost-effective manner.

1. Introduction

Crops account for 15 × 106 km2 (i.e. approximately 40%) of the Earth’s ice-free land surface,providing important products for human survival, including food, vegetables, edible oil, fiber, and medicine (Fritz et al. Citation2015; Molotoks et al. Citation2018; Ramankutty et al. Citation2008). With the rapid growth of the global population, one of the greatest challenges facing humanity in the twenty-first century is the ability to address the growing demand for more crop products while also mitigating the environmental concerns caused by agriculture, including the loss of biodiversity, fragmentation and loss of habitats, shortage and waste of water resources, and important greenhouse gas emissions (Foley et al. Citation2011; See et al. Citation2015). Humans have changed the landscape of the Earth through farming and other activities, which have directly affected global change processes, such as climate, hydrology, and the biological Earth cycle (Foley et al. Citation2005; Li et al. Citation2023; Ramankutty et al. Citation2008; You et al. Citation2014). Crop-planting area and its spatial distribution information are important data sources for global ecological environment change analysis, Earth system model development, and environmental sustainability assessment, as well as other important data sources for studying regional food balance and predicting the comprehensive production capacity of agricultural resources and population carrying capacity (Wardlow and Egbert Citation2008; Wu et al. Citation2014). Therefore, timely and reliable information on the crop planting area and its spatial distribution are of fundamental importance for better crop growth monitoring, crop yield estimation, crop-planting structure adjustment and optimization, agriculture policymaking, and food security assurance (Thenkabail et al. Citation2012).

Traditionally, crop type and planting area data are acquired through manual field surveys and censuses . For example, the multilevel tabular sampling method (household-village-county-prefectural-province-national) usually is used by national statistical departments to acquire crop-specific areas at high aggregate levels (e.g. country, prefectural, or province). Although agricultural statistics lack the ability to provide detailed spatial distribution information of crops, they often have high reliability in describing crop types and quantitative characteristics, which also recognized by major international organizations, such as the Food and Agriculture Organization of the United Nations (FAO), the World Bank, and the Consultative Group on International Agricultural Research (CGIAR) ( Ramankutty et al. Citation2008). With the attendant benefits of independence and continuity records over time, agricultural statistics often serve as supplementary and reference data for remote sensing–based crop mapping at various administrative levels (Hu et al. Citation2021; Monfreda, Ramankutty, and Foley Citation2008). Limited by low timeliness, low data sharing, and data gaps, as well as being easily interfered with manual operation, time-consuming, and labor-intensive, agricultural statistics do not meet the needs of real-time fine monitoring of crop-planting areas on a large spatial scale.

Remote sensing technology, because of its advantages of comprehensive, macroscopic, frequent data acquisition, and relatively cost-effective observation, has provided a new technical means for rapid and dynamic solutions for regional- to global-scale crop mapping (Begue et al. Citation2018; Song et al. Citation2017; Xiao et al. Citation2005). Since the 1980s, multisensor, multiresolution remote sensing images, such as Landsat, MODIS, AVHRR, and Sentinel, have been used extensively for to identify crop types and to map planting areas (Hu et al. Citation2016; Massey et al. Citation2017; Skakun et al. Citation2017; Xiao et al. Citation2005). Among these images, MODIS data have shown the greatest potential for large-scale crop mapping because of its unique combination of short revisit cycles (daily), rich spectral bands (36 bands), moderate spatial resolutions (250, 500 m), and global coverage (Chen et al. Citation2018; Pittman et al. Citation2010; Zhong et al. Citation2016). The physical theoretical foundation of crop mapping by remote sensing is evident in the different spectral reflectance signatures of crops (Hartfield et al. Citation2013; Van Niel and McVicar Citation2004), because the biophysical characteristics of crops can be directly reflected in the spectral signal, which continuously change over time. The crop phenophase is the specific theoretical foundation of crop recognition by remote sensing. Because each crop type has its own specific growing pattern and fixed seasonal rhythm feature, understanding how to comprehensively analyze the differences in crop phenological characteristics has become a core problem in distinguishing specific crops from other crops and green vegetation (Bargiel Citation2017Foerster et al. Citation2012; Pena-Barragan et al. Citation2011). The spectral and phenological signatures of crops can complement each other, providing the necessary information for crop identification based on remote sensing (Pan et al. Citation2012).

Several studies have used supervised or unsupervised classification algorithms dedicated to mapping crops from multitemporal or single-date remote sensing images throughout the growing season (Hu et al. Citation2016; Lobell and Asner Citation2004; Xiong et al. Citation2017). The core of the single date-based method is to find the most appropriate image in the crucial phenological phase to optimize the spectral separability of different crop types. Although this method is simple and easy to use, it uses medium and high spatial-resolution optical remote sensing images, such as Landsat and Sentinel-2, because of the long revisit cycle and relatively small swath of such sensors and the impact of cloud and rain weather conditions. The crucial phenological phase images are often insufficient, and it is extremely difficult to identify two or more crops that have similar spectra. Generally, the crop-mapping method using multitemporal images selects one or more feature variables, such as the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), or leaf area index (LAI), and then determines the threshold value of the ‘spectrum-time series’ curve for each feature variable at a specific time to identify a crop. Compared with single-date remotely sensed data, multitemporal images can accurately depict the spectral dynamics of crops throughout the entire growing period and better combine the spectral and phenological signatures of crops. This approach is promising for capturing subtle but crucial phenological differences among various crop types (Foerster et al. Citation2012; Zhang et al. Citation2015). Crop-mapping methods applied to multitemporal images have been proven to perform better than single date–based methods and have been successfully implemented at the regional scale (Begue et al. Citation2018; Gumma et al. Citation2014; Luo et al. Citation2020; Pena-Barragan et al. Citation2011; Zhong, Gong, and Biging Citation2014) as well as at the global scale (Pittman et al. Citation2010; Salmon et al. Citation2015; Thenkabail and Wu Citation2012). Different crop types, however, may exhibit similar spectral characteristics at a given time point. Although these hard classification methods, which aggregate all crop types into a single category in a pixel, provide an efficient means to map crop type and extent, they may result in significant errors in estimated crop areas because of the spatial resolution limitations of the images. Notably, heterogeneous agricultural landscapes characterized by small-scale family farms and excessive land fragmentation often have the problem of mixed pixels (Clauss, Yan, and Kuenzer Citation2016; Lobell and Asner Citation2004).

To address this ‘mixed pixels’ problem caused by coarse resolution in heterogeneous agricultural regions on crop mapping, some studies have explored subpixel mapping of crops and have obtained the fractional abundance of each type of crop on a pixel. Existing methods can be group into two categories: multiple regression and linear spectral unmixing methods. Multiple regression methods generally use traditional statistical regression analysis or machine learning methods to develop quantitative relationships between the subpixel crop fraction and a set of input variables involving spectral signatures from different temporal periods. For example, Pan et al. (Citation2012) developed a linear regression model between the MODIS EVI time series and actual crop acreage to estimate winter wheat fractions within 250 m of spatial resolution in two agricultural areas of China. Liu et al. (Citation2018) used linear regression to construct the relationship between the coefficient of variation of the land surface water index derived from MODIS and paddy rice reference planting fraction calculated from high-resolution Gaofen2 images. Atzberger and Rembold (Citation2013) used a neural network to develop nonlinear regression relationships between the AVHRR NDVI time series and fractional crop area within a 1-km spatial resolution in central Italy. In addition, a stepwise regression model was established using the MODIS EVI time series and Landsat-derived fractional cropland data to estimate the cropland fraction at a regional scale (Zhu, Zhang, and Huang Citation2019). The spectral unmixing method assumes that the reflectance of a pixel is a linear combination of the surface component spectra within a pixel and that the weight of each component is proportional to the area occupied by that component. For example, Lobell and Asner (Citation2004) used a probabilistic linear unmixing method to estimate subpixel crop area fractions based on MODIS time series reflectance signatures. Subpixel crop mapping methods have created new possibilities for generating more accurate crop planting area data. Because of different field management (e.g. irrigation, fertilization, and sowing), planting modes (e.g. single or double cropping, continuous cropping, rotation, intercropping and interplanting), local weather conditions, and soil quality, two phenomenon often occur: (1) different crop types with the same spectrum and (2) variations in spectral signatures of the same crop type. Variations in spectral signatures across the growing period and growing calendar, as well as the spectral variability of crops, including intraclass and interclass variability, also may vary with locations (Pena-Barragan et al. Citation2011; Sakamoto, Wardlow, and Gitelson Citation2011). Consequently, the relationships between the subpixel crop fraction and spectral signature can vary across geographical spaces. This type of fractional abundance mapping can provide the proportion of specific crops in a pixel, but it cannot provide the specific location of crops. Super-resolution mapping (SRM), can be considered as a follow up processing technology of fractional abundance mapping, aims to explore both the proportion and location by reconstructing finer-scale images from coarser images and has achieved significant progress in land cover mapping. For example, Wang et al. (Citation2020) constructed an SRM scheme that first utilized spectral unmixing to calculate the land cover proportions corresponding to the subpixels and then allocated the land cover labels to all subpixels by class allocation according to their land cover proportions. He et al. (Citation2022) used deep learning-based SPM for super resolving 30 m Landsat images to generate meter-level high-resolution land cover maps for 28 metropolises in China. To our knowledge, however, the SRM techniques have not yet been applied to crop-mapping applications, which may represent a promising research direction for fine crop mapping.

Note that the subpixel crop mapping has led to the transformation of the crop-mapping mode from the initial single type of crop mapping to the planting structure monitoring composed of multiple crops. Traditional hard classification is considered to be single-crop-type mapping because it only classifies a pixel according to a certain type of crop. For example, single-category mapping of paddy rice (Clauss, Yan, and Kuenzer Citation2016; Dong et al. Citation2015; Frolking et al. Citation2002; Gumma et al. Citation2014; Xiao et al. Citation2005; Zhang et al. Citation2015), wheat (Zhong et al. Citation2019), maize (Zhang, Feng, and Yao Citation2014), and soybean (Song et al. Citation2017; Zhong, Gong, and Biging Citation2014) has been extensively explored. Although some studies have focused on multicrop mapping, such as soybean and corn (Zhong, Gong, and Biging Citation2014; Citation2016), as well as mapping of four crops of rice, maize, soybeans, and wheat (Hu et al. Citation2016), or even more types of crops(Akbari et al. Citation2020; Bargiel Citation2017; Foerster et al. Citation2012; Wardlow and Egbert Citation2008), they have generated hard classification maps for each type of crop separately and have not considered the harmony of multiple crops. Thus, they are essentially single-crop-type mapping. Early subpixel abundance mapping has explored the estimation of the proportion of single crops, such as wheat (Atzberger and Rembold Citation2013; Lobell and Asner Citation2004; Pan et al. Citation2012), paddy rice (Liu et al. Citation2018), and maize (Sakamoto, Wardlow, and Gitelson Citation2011). Crop-planting structure monitoring should consider the relative contributions of all crops to a planted area (Wu and Li. Citation2012; Pena-Barragan et al. Citation2011). For example, cropping patterns (e.g. soy-maize, soy-cotton, soy-pasture, soy-fallow, fallow-cotton, and single crop) in the state of Mato Grosso, Brazil, were investigated by Chen et al. (Citation2018). Yin et al. (Citation2020) first generated probability maps for three crops of rice, corn, and soybean, and then the probability maps were composited into a final crop layer by considering the probability of each crop at every pixel. China's global crop-monitoring system (CropWatch) has used remote sensing data and field data to determine key crop production indicators, such as the planting proportion of more than 10 major crops (wheat, rice, and maize) in China (Wu et al. Citation2014). The National Agricultural Statistics Service (NASS) of the US Department of Agriculture (USDA) has produced a nationwide crop-specific land cover map of the United States each year since 2008, covering 155 classes of cultivated crops (Lark, Schelly, and Gibbs Citation2021).

Because of the influence of imaging conditions, image acquisition capability, mixed pixels, and scale conversion, crop types and areas generated based on hard classification and subpixel abundance mapping methods are often inconsistent with statistics. Therefore, it has been difficult to achieve ideal crop-mapping accuracy by simply relying on satellite-based datasets. The latest progress in crop mapping is the emergence of new ‘data fusion’ techniques to integrate satellite images with agricultural statistics to produce high-accuracy crop type and area maps (See et al. Citation2015). By assuming that the agricultural statistical data are the reference truth, some previous studies have used a satellite-derived land cover dataset to determine the spatial distribution of cropland, and a statistical downscaling model with several constraints was developed to spatially disaggregate the statistical crop-planting or crop-harvesting areas within each geopolitical unit scale to the grid (Frolking et al. Citation2002; Monfreda, Ramankutty, and Foley Citation2008; You and Wood. Citation2006). The cross-information entropy method has been used to allocate crop statistical information to 5 arcmin grids and to produce spatial distribution maps of world crops ( You et al. Citation2014). Hu et al. (Citation2021) mapped subpixel crop distributions by integrating MODIS time series and agricultural statistics using a random forest (RF) regression model with high-spatial-resolution reference crop distribution maps (i.e. SPOT and Landsat) as training samples. Combining satellite-based datasets with agricultural statistics has been a powerful way to map crop-planting and harvesting area distributions at regional or global scales. In particular, advanced machine learning technology has shown great potential for combining statistics with satellite observations and transferring knowledge to the pixel level. Because of the relatively coarse spatial resolution and low mapping accuracy, existing large-scale crop-planting and harvesting area maps produced by these methods often have shown poor regional suitability, especially under conditions of complex terrain and heterogeneous planting structure. In addition, existing regional-scale crop-mapping methods often require high-resolution reference crop distribution maps or a large quantity of ground survey data as the response target. It often is difficult to obtain this information or requires experts to perform interpretations based on prior knowledge, which results in poor universality of the method.

Based on these achievements, we selected Gansu Province in northwest China, which is characterized by a heterogeneous small-scale agricultural system, as the research area. The objectives of this study were twofold: (1) by leveraging the power of satellite data in the detailed expression of crop spatial distribution and statistical data in the description of crop types and quantitative characteristics, we proposed a robust method combining particle swarm optimization (PSO)–based feature selection with RF regression to allocate crop-plating area statistics to the 1 km×1 km grid scale without the need of high-resolution reference crop distribution maps or ground survey data support; and (2) we explored the planting structure monitoring of fractional abundance of multiple crop types in regions characterized by complex terrain and heterogeneous smallholder farming landscapes. This paper is organized as follows: Section 2 introduces the geographical information of the study area and describes the study dataset that was used to develop the subpixel fractional crop-planting area (FCPA) mapping model. Section 3 presents the details of methodology. Sections 4 and 5 present the results and a discussion, respectively. Finally, the conclusions are presented in Section 6.

2. Study area and datasets

2.1. Study area

Located in the inner regions of Northwestern China, Gansu Province (32°31′–42°57′N and 92°13′–108°46′E) () is an agricultural province whose primary production is grain and cash crops. The geographical environment and climatic conditions, however, are not advantageous for agricultural development. In terms of geographical environment, the region covers an area of 42.58 × 104 km2 and has a complex terrain with an elevation ranging from 624 to 5602 m. The physiognomies of Gansu Province are complex and diverse and include six ecological districts: the gully region of the Loess Plateau, the Qinling Zhongshan canyon area, the grassland meadow area of Gannan Plateau, the hilly area of the Loess Plateau, the Hexi Corridor gobi district, and the desert district. In terms of climatic conditions, the various climate types range from humid (southwest) to semi-humid, semi-arid, and arid. The mean annual temperature is 273.16–287.26 K (An et al. Citation2020). Water resources are scarce and unevenly distributed; for example, the annual total precipitation ranges from 40 to 800 mm, with a drying gradient from southeast to northwest (Wang et al. Citation2017). Owing to the unique geographical and climatic conditions, the study area is characterized by the smallholder farming systems with tremendous heterogeneity. Although the study area is rich in crop variety resources, large regional differences in agricultural production, complex planting structures, heterogeneous and fragmented landscapes, diverse cropping patterns, and cultivation habits have made the mixed pixel effect and scale sensitivity prominent, posing a serious challenge for crop mapping.

Figure 1. Overview of the study area: (a) geographical location; (b) six ecological districts; (c) elevation; (d) land cover and land use.

Figure 1. Overview of the study area: (a) geographical location; (b) six ecological districts; (c) elevation; (d) land cover and land use.

2.2. Datasets and preprocessing

The data used in this study, including geographic and agricultural statistical data, are shown in . We used geographic data to provide spatial distribution details for subpixel crop mapping, and the county-level agricultural statistics served as the reference ‘truth’ within an administrative unit.

Table 1. Geographic data and agricultural statistics for fractional crop-planting area mapping.

2.2.1. Geographic data

The geographic data used in this study included China’s land use and land cover dataset (CNLULC), China’s five-day NDVI composite product (MODND1F), digital elevation model (DEM) of China (Tang Citation2019), World Database of Protected Areas (WDPA) (UNEP-WCMC, Citation2022), and China land cover dataset (CLCD). The CNLULC is a multiperiod remote sensing monitoring dataset of land use and land cover change in China, which currently includes 10 periods of data from 1980, 1990, 1995, 2000, 2005, 2010, 2013, 2015, 2018, and 2020 (Xu et al. Citation2018). CNLULC adopted a hierarchical classification system, which include the following six classes of land cover: cropland, woodland, grassland, waterbody, unused land, and built-up land. We divided the six classes of land cover were into 25 land cover classes, as shown in . The spatial resolution of CNLULC was 100m, and the overall accuracy of the 25 classes of land use was above 91.2% (Liu et al. Citation2014). The MODND1F product was generated by taking the five-day maximum value from the daily 500 m MODIS/Terra NDVI built from daily surface reflectance data (Chen et al. Citation2004), the five-day temporal intervals fully captured the subtle biophysical differences in critical crop growth stages. The China land cover dataset (CLCD) is the first fine-resolution (i.e. 30 m) Landsat-derived annual land cover dataset of China for 1990–2021. It has better accuracy than other existing fine-resolution land cover products, including ESACCI_LC, FROM_GLC, and GlobeLand30 (Yang and Huang Citation2021). In this study, taking the year 2010 as an example, we extracted the corresponding data for Gansu Province from these datasets, in which the NDVI data considered only the growing season of crops (i.e. April–September). Then, we projected all of the data in the study area to the Krasovsky ellipsoid coordinates and Albers projection coordinate system, and the NDVI data were resampled to a resolution of 1 km.

Table 2. CNLULC land use classification scheme.

2.2.2. Agricultural statistics

We downloaded country-level agricultural statistics for the year 2010 from the Gansu Province Bureau of Statistics, China, which provides the sown area for each crop type in 87 counties in Gansu Province. In this study, we reclassified all crops into the following six classes: wheat, maize, oil-bearing, vegetables, orchards, and other crops.

3. Methodology

The proposed methodology for integrating geographic data and agricultural statistics for subpixel crop-type mapping consisted of four main parts (): (1) feature extraction, (2) feature selection, (3) disaggregation of statistics data within each county unit to 1 km × 1 km, and (4) adjustment of the predicted crop planting area value to match the statistical data.

Figure 2. Workflow of integration of geographic data and agricultural statistics to generate FCPA estimates.

Figure 2. Workflow of integration of geographic data and agricultural statistics to generate FCPA estimates.

3.1. Feature extraction

We extracted relevant features from the geographic data as potential predictive variables for FCPA mapping. First, we calculated the proportion of each of the 25 land cover classes per 1-km pixel using 100 m of CNLULC data. In addition to providing NDVI features of 35 time phases, we further extracted 16 phenological features (i.e. beginning of season, end of season, length of season, base value, time of middle of season, seasonal amplitude, rate of increase at the beginning of the season, rate of decrease at the end of the season, large integrated value, small integrated value, value for the start of the season, value at the end of the season, maximum value, minimum value, mean value, and variance value) from the NDVI time series of the growing season (i.e. April–September) using the TIMESAT program (Jonsson and Eklundh Citation2004) and statistical calculations. We used digital elevation model (DEM) data to extract three topographic features: elevation, slope, and aspect. Thus, we extracted 79 feature variables as potential predictive variables. The WDPA data were used to mask terrestrial and inland water protected areas within the study area. We used the CLCD to calculate the proportion of cropland within each 1-km pixel as an independent validation dataset.

For each county-level administrative unit, we calculated the proportion of each of the six crop class planting areas by dividing the statistical sown areas of each category of crops by the total land area after masking the protected area for the administrative unit, which were used as response variables for FCPA mapping.

3.2. Feature selection

Despite the apparent functionality of providing more information, a large number of studies have proven that blindly increasing the number of input features will does not increase the modeling accuracy but instead increases the complexity and reduces the efficiency of the model, leading to that is called the ‘curse of dimensionality.’. Moreover, multiple feature variables formed by a single feature extracted from multitemporal remote sensing images may be highly correlated or redundant. Irrelevant and redundant features may even reduce the performance. Feature selection aims to select an optimal subset of input features (even though the number of features is small, it contains the most effective features) to achieve similar or even better modeling performance than using all features while reducing computational costs and data redundancy (Gilbertson and van Niekerk Citation2017; Xu, Du, and Younan Citation2017).

Because of the two conflicting objectives in feature selection—that is, maximizing the modeling performance and minimizing the number of features—it is necessary to generate a Pareto front of nondominated solutions (feature subsets) for feature selection that makes a compromise between these two objectives. Most existing feature selection algorithms, however, treat the task as a single objective problem. Xue et al. (Citation2012) proposed a multi-objective PSO feature selection scheme and revealed that the PSO-based multi-objective feature selection not only effectively remove redundant features but also kept or yielded better accuracy and reducd computational complexity. Therefore, we used this method to select the informative features subsets in this study.

The PSO algorithm simulates the behavior of bird flocking and is an efficient global search technique for feature selection (Kennedy and Eberhart Citation1995; Xue, Zhang, and BrowneSchool Citation2014). In this study, we adopted binary PSO-based feature selection to automatically evolve a feature subset from the original feature set as prediction variables to improve the accuracy of crop planting area mapping. Note that because the target response variable is at the county level, we had to aggregate 79 feature variables of 1 km × 1 km spatial resolution to the county level to match the response variable. We performed regional statistics and average operations to calculate the mean values of each feature variable in counties to obtain the original feature set of 79 feature variables at the county level. A particle in the population was a candidate solution, which was formed by a binary vector with a length equal to the number of the original features; its cell value was Boolean, where ‘1’ means that the corresponding feature is selected and ‘0’ otherwise. Suppose that in an N-dimensional (79 in this paper) target search space, there are m particles to form a population, where the position vector of the i-th particle is Pi=(pi1,pi2,,pin,,piN), i=1,2,,m, and pin{0,1}. Each particle also has a velocity vector Vi=(vi1,vi2,,vin,,viN), which represents the direction and speed of particle motion.

The binary PSO-based multi-objective feature selection process is as follows (Xue et al. Citation2012; Citation2014), and we performed this process using the FeatureSelect software (Masoudi-Sobhanzadeh, Motieghader, and Masoudi-Nejad Citation2019).

  1. Initialization. The population of particles was first initialized randomly in a multidimensional search space.

  2. The fitness of the particles was calculated. This study used RF regression to train particle samples in the population and estimate the accuracy of each particle.

  3. Iteration. Each particle searched for the optimal solution separately in the search space, recorded it as the current local optimal solution pbesti(t) at iteration t, and further determined the global optimal solution gbest(t) in all pbesti(t). The update of particle velocity and position can be explained using Equations (1) and (2): (1) Vi(t+1)=wVi(t)+c1r1(pbesti(t)Pi(t))+c2r2(gbest(t)Pi(t)),(1) (2) {Pi(t+1)=1ifr3(t)<s(Vi(t+1))Pi(t+1)=0else,(2) where si(Vi(t+1))=1/(1+Vi(t+1)) is the sigmoid transformation function; c1and c2 are acceleration constants (set as c1=c2=1.496 in this study); r1, r2, and r3 are random values between 0 and 1; w is the inertia weight with an initial value of 1 and a damping ratio of 0.99; and the maximum number of iterations was 1,000.

3.3. Disaggregating the statistics data to 1 km × 1 km

In this study, we selected the RF regression algorithm to establish the relationship between the prediction and response variables to disaggregate the statistical data within each county unit to 1 km × 1 km. RF is a fast, simple, dynamic, and durable classification and regression model very few limitations, and it has been proven to generate good predictions on similar tasks (Hu et al. Citation2021; Li, Hou, and Huang Citation2021; Nicolas et al. Citation2016). The agriculture statistics are considered to be the reference ‘truth’ for the crop-planting area, and we used the geographic data to spatially locate the specific crop within an administrative unit and derived the total area of the crop-planting area in an administrative unit from the statistics data.

RF is an integrated learning algorithm that uses a bagging procedure to integrate multiple decision trees (Breiman Citation2001). The basic idea of RF regression is to average a collection of noisy but approximately unbiased tree models, and hence reduce the variance. We achieved variance reduction in RF by randomly selecting input variables during tree growth. The disaggregation process of crop statistical data based on RF regression is as follows:

  1. RF-based subpixel crop mapping model at the county level.

For crop class k, assuming that the above PSO feature selection process has selected Mk (k=1,2,,6) features, we used these features as prediction variables, and used the crop area proportion as the response variables to generate a training sample set. The RF-based regression model was then constructed for each crop class, which included the following two main steps:

  • Step 1. For b = 1 to B: (a) Perform a bootstrap sample Z of size M (the number of county-level administrative units, 87 in this study) from the training sample set Z. (b) Grow a decision tree Tb using the bootstrapped sample Z by recursively repeating the processes of randomly selecting m variables from Mk features, picking the best split point among m, and splitting the mode into two daughter nodes for each leaf node of the tree. In addition, to clarify the effectiveness of feature selection, we constructed a corresponding RF subpixel crop-mapping model that did not consider feature selection.

  • Step 2. Output the ensemble of decision trees {Tb}1B, and take the average value of the prediction results of B decision trees as the prediction value of the RF-based model.

  • 2. RF-based subpixel crop-mapping model at the 1 km × 1 km scale

Assuming that the previous quantitative relationship was invariable on a multiscale, we directly applied the constructed six RF models at the county scale to the 1 km ×1 km scale. In particular, we directly used the selected corresponding feature variables of each crop class at a 1 km × 1 km spatial resolution as the inputs of the established RF models, and the output of the RF models was the estimated subpixel crop planting area at the 1 km × 1 km grid scale.

3.4. Adjusting the predicted crop planting area value to match the agricultural statistics

The total area of the subpixel crop-planting area at 1 km × 1 km grid scale estimated by the RF models was inevitably inconsistent with the statistical data, and the errors were affected by two main factors: potential assumptions and six RF models are independent of each other. The potential assumption of this study was that the relationship between statistical proportion of crop-planting area and the 79 feature variables was identical at the county scale and at the 1 km × 1 km grid scale. Nevertheless, we found clear differences in the distribution characteristics of the input predictive variables at the two scales, and these differences inevitably led to errors when the models established at the county scale were directly applied to the grid scale. During the process of developing the subpixel crop-mapping model, the supervised RF method accommodated only a single response variable composed of the six classes. Although we developed six separate RF models for wheat, maize, oil-bearing, vegetables, orchards, and other crops, the six independent models may have caused the sum of the areas of the six class crops on a pixel to exceed 100%. Therefore, we needed to adjust our spatially explicit predictions from the RF models to match the agriculture statistics at the county scale. In this study, we adopt the iterative area gap allocation (IAGA) method. The specific adjustment process is shown in , which includes the following three steps:

  • Step 1. Aggregate the subpixel planting area distribution data for each class of crops predicted on a 1 km × 1 km scale to the county scale. Assume that the aggregated planting area is Yij for crop class i in county j and Aij is the sown area of crop class i in county j from the agricultural statistics.

  • Step 2. For each crop class i in county j, calculate the area gap Gij between the predicted and statistical area values, as follows: (3) Gij=AijYij.(3)

Figure 3. Workflow of adjusting the predicted crop-planting areas by RF models to match the agricultural statistics.

Figure 3. Workflow of adjusting the predicted crop-planting areas by RF models to match the agricultural statistics.

If the area gap between the two was smaller than 5% of the statistical area value, no adjustment was made; otherwise, the correction factor was cfij, as follows: (4) cfij=Aij/Yij.(4)

We adjusted the subpixel crop area values of crop class i in county j to the original predicted value multiplied by cfij.

  • Step 3. The sum of the subpixel planting areas of the six crop classes on a pixel (x, y) is calculated as follows: (5) SAxy=i=16Yxyi,(5) where Yxyi is the subpixel planting area of crop class i in pixel (x, y). If the area SAxy is not greater than 100%, no adjustment is made; otherwise, the subpixel planting area of each crop class in this pixel is divided by SAxy. Then, we return to Step 1 and repeat until no adjustment is required.

3.5. Performance evaluation

We used three common performance indicators—the coefficient of determination (R2), root mean square error (RMSE), absolute bias (AB), and structural similarity (SSIM) index—to evaluate the performance of the proposed FCPA mapping method.

4. Results

4.1. Feature selection

The selected optimal subset of input features for each of the six crops is shown in , which reveals that the optimal subset varied among the different crops, mainly because crop distribution was influenced by the interaction of factors, including climate conditions, soil conditions, topography, human factors, and the biological characteristics of the crops. The optimal subset of land cover for wheat, maize, oil-bearing, vegetable, orchards, and other crops contained 46, 44, 44, 43, 45, and 51 features, respectively. The main land cover features selected in this study were paddy land, grassland, streams and rivers, other built-up land, and other unused land (a). Of note, the land cover class of beaches and shores is redundant features and was not selected into the optimal subset for all six crops as the proportion of beaches and shores in the study area is 0. These results were consistent with reality. As is well known, most crops are planted in these land cover areas and Gansu Province contains all landforms except the ocean. The key phenological features selected were the length of the growing season, the middle of the growing season, and the large integrated value, which were also the most direct responses to the biological characteristics of crops. The selected temporal NDVI features were concentrated primarily in the growth stages of germination, small-term, medium-term, and maturity, which, to some extent, represented the growth status of crops. In terms of topographic features, the impact of slope seems to have been slightly greater, possibly because of the negative correlation between slope and crop distribution.

Figure 4. Selected features and their importance ranking: (a)–(f) represents six types of crops (i.e. wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively).

Figure 4. Selected features and their importance ranking: (a)–(f) represents six types of crops (i.e. wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively).

4.2. Crops distribution

The distribution of six types of crops based on statistical data and the corresponding disaggregating results of FCPA at 1-km resolution based on RF regression models with or without PSO feature selection over the study area are shown in . It is evident the planting area of wheat, corn, and other crops in the research area was relatively large, whereas the area of oil-bearing, vegetables, and orchards was relatively small. The spatial patterns of downscaled crop distributions using RF-based models were correlated with agricultural statistics. However, we still observed mismatching in the high-concentration crop areas and some of the low crop coverage areas between the statistical data and downscaling results, and the most notable differences were found in vegetables ( d1-d3) and orchards ( e1-e3), which had relatively small planting areas. The statistical data and RF-based downscaling results showed that all six crops were distributed mainly in the Hexi Corridor and the southeastern region of the study area. The Hexi Corridor is mainly irrigated agriculture, whereas southeastern Gansu has relatively more rainwater resources and milder temperatures, providing better natural conditions for crop growth. Statistical data provided only quantitative characteristics of crop planting areas in counties, and subpixel crop distribution maps derived by RF-based downscaling models provided specific planting locations and spatial details for various crops. Moreover, the subpixel crop-mapping results of the RF-based models with and without feature selection were very similar, and from a visual perspective, we found only a weak difference between the two.

Figure 5. Comparison of agricultural statistics with disaggregating estimates based on RF-based regression models. The first to sixth rows of subplots from top to bottom represent six types of crops (i.e. wheat, maize, oil-bearing, vegetable, orchards, and other crops) respectively. The first to third columns from left to right are (a1–f1) agricultural statistics, and the predicted FCPA by RF models (a2∼f2) without feature selection and (a3∼f3) with PSO feature selection.

Figure 5. Comparison of agricultural statistics with disaggregating estimates based on RF-based regression models. The first to sixth rows of subplots from top to bottom represent six types of crops (i.e. wheat, maize, oil-bearing, vegetable, orchards, and other crops) respectively. The first to third columns from left to right are (a1–f1) agricultural statistics, and the predicted FCPA by RF models (a2∼f2) without feature selection and (a3∼f3) with PSO feature selection.

As mentioned in Section 3.4, because of the potential assumptions when developing the RF-based models and the limitation of the independence of the six crop-mapping models, the downscaling results were inevitably inconsistent with the statistical values. The agricultural statistical distribution and adjusted subpixel planting areas of the six crops are shown in . Evidently, the adjusted spatial distribution of the six crops had a better match with agricultural statistics, and the scope of crop aggregation areas was more consistent with the actual situation. The corrected subpixel crop planting area had a clearer spatial granularity and more prominent spatial contours, which better reflected the details of the spatial distribution of each type of crop. Thus, we considered that the RF-based disaggregating models that combined postprocessing correction performed a reasonable job of reproducing agricultural statistical data and obtaining detailed information on the spatial distribution of crop planting in the study area.

Figure 6. Comparison of agricultural statistics with disaggregating estimates based on adjusted RF regression models. Except for adjusting the prediction of RF-based models, the symbols are identical to those in .

Figure 6. Comparison of agricultural statistics with disaggregating estimates based on adjusted RF regression models. Except for adjusting the prediction of RF-based models, the symbols are identical to those in Figure 5.

4.3. Accuracy evaluation

4.3.1. Comparison with agricultural statistics

The regional comparison of agricultural statistics and RF-based downscaling results aggregated at the administrative unit level is shown in the scatter plot in . This verified the conclusion given in Section 4.2 that the RF-based downscaling results were well correlated with the statistical data, regardless of whether or not we considered feature optimization. However, we also noted a certain degree of mismatch, which could be explained as a phenomenon of significant overestimation and underestimation in the estimated crop-planting area aggregated at the county level. The corrected RF-based models significantly improved the matching degree and alleviated the problems of overestimation and underestimation. Moreover, the scatter between the estimated and statistical values of the RF-based models containing PSO feature selection was slightly closer to the 1:1 line. The quantitative performance evaluation results for R2, RMSE, and AB in further confirmed this description. clearly shows that the corrected RF models had significantly improved accuracy compared with the uncorrected ones, with higher R2 above 0.97, and much lower RMSE and AB. The RF-FS-C model with the best performance had RMSE values below 1% and AB values of 2.83%, 1.53%, 2.93%, 4.61%, 5.24%, and 3.27% for six crops of wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively. Although the performance of the uncorrected RF models was somewhat poor (particularly for oil-bearing and vegetable), the accuracy was still within an acceptable range in most cases, such as for both wheat and maize with R2 > 0.73. Because of the complex terrain and heterogeneous smallholder farming landscapes in the study area, the RF regression model inevitably encountered problems of low-value overestimation and high-value underestimation. When the predictive factors were taken as the mean value for each variable at the 1-km scale in the county, the scale effect exacerbated the problem of low-value overestimation in the RF-based models. The RMSE and AB of wheat, maize and other crops appeared to be larger, mainly because the total planting area of these crops was much higher than that of the other three crops. Our results suggested the advantages of integrating satellite-based datasets and agricultural statistics to map subpixel crop-type distributions and to provide a consistent estimation of crop acreage.

Figure 7. Comparison of agricultural statistics with RF-based downscaled estimates aggregated to the administrative unit level. The first to sixth rows of subplots from top to bottom represented six crops of wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively. The first to fourth columns from left to right represent downscaled results from ‘RF,’ ‘RF-FS,’ ‘RF-C,’ and ‘RF-FS-C,’ respectively. A red line portrays the 1:1 line for the estimated census relationship and each scatter represents a county.

Figure 7. Comparison of agricultural statistics with RF-based downscaled estimates aggregated to the administrative unit level. The first to sixth rows of subplots from top to bottom represented six crops of wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively. The first to fourth columns from left to right represent downscaled results from ‘RF,’ ‘RF-FS,’ ‘RF-C,’ and ‘RF-FS-C,’ respectively. A red line portrays the 1:1 line for the estimated census relationship and each scatter represents a county.

Table 3. Accuracy of the RF-based FCPA mapping models on the county scale.

4.3.2. Differences in six ecological districts

To explore the spatial distribution characteristics of each type of crop in the study area and the error distribution of various RF-based FCPA mapping models, we calculated the statistical planting area of each type of crop in six ecological districts and the estimated planting area aggregated to the ecological district level (). It was evident that all six types of crops were distributed in the hilly areas of the Loess Plateau and in the gully region of the Loess Plateau, followed by the Qinling Zhongshan canyon area and Hexi Corridor gobi district, and the least distributed in the desert and grassland meadow areas of the Gannan Plateau. The estimation of various models in the gully region of the Loess Plateau was more accurate, possibly because of the relatively flat terrain and the lower heterogeneity in this district. In addition, the adjusted RF-based model estimates the planting area of various crops with excellent consistency across the six ecological districts, except for the types of other crops that are slightly underestimated in the hilly areas of the Loess Plateau and gully regions of the Loess Plateau. The uncorrected RF crop estimation model of oil-bearing, vegetables, and orchards had the greatest overestimation, especially in the hilly area of the Loess Plateau and Qinling Zhongshan canyon area. For wheat, corn, and other crops with larger planting areas, the estimation of wheat was relatively accurate in various ecological regions, and the error of corn was overestimated mainly in Hexi Corridor gobi district and Qinling Zhongshan canyon area, whereas the other crops were underestimated mainly in the hilly areas of the Loess Plateau and overestimated in the Hexi Corridor gobi district. In short, more crops are distributed in ecological areas that are naturally suitable for crop growth, which is in agreement with the actual situation. The comparison between the statistical planting area and RF-based downscaled estimates within the ecological district fully confirms the rationality of the RF-based regression model and the necessity of feature selection and postprocessing adjustments.

Figure 8. Comparison of agricultural statistics of six crops with RF-based downscaled estimates aggregated to the ecological district scale. Each subplot represents an ecological area: (a) hilly area of the Loess Plateau; (b) desert district; (c) Hexi Corridor gobi district; (d) Qinling Zhongshan canyon area; (e) gully region of the Loess Plateau; and (f) grassland meadow area of Gannan Plateau.

Figure 8. Comparison of agricultural statistics of six crops with RF-based downscaled estimates aggregated to the ecological district scale. Each subplot represents an ecological area: (a) hilly area of the Loess Plateau; (b) desert district; (c) Hexi Corridor gobi district; (d) Qinling Zhongshan canyon area; (e) gully region of the Loess Plateau; and (f) grassland meadow area of Gannan Plateau.

4.3.3. Comparison with CLCD’s cropland

To further analyze the reliability and rationality of the crop distribution trend estimated by the RF model, we also compared the proportion of cropland derived from 30-m LCLD data with the estimated total crop planting area on the 1-km pixel scale (). The total crop-planting area on a 1-km pixel refers to the sum of six types of subpixel crop-planting areas. We found that the correlation between the proportion of cropland and the total crop-planting area estimated by the four types of RF-based models (RF, RF-C, RF-FS, and RF-FS-C) was 0.75, 0.69, 0.76, and 0.69, with SSIM index values of 0.434, 0.436, 0.433, and 0.436, respectively. Notably, although the postprocessing adjustments significantly improved the matching between the estimated crop-planting area and statistical data, they may have had a negative impact on the spatial distribution pattern of crops and also may have had a side effect on the spatial distribution pattern of crops. These results revealed that the estimated subpixel crop area not only had good consistency with the CLCD cropland distribution but also retained a reasonable spatial distribution pattern. Because CLCD is an independent data source, it is by far one of the most accurate land use and land cover datasets. This analysis indirectly confirmed the rationality and reliability of our proposed subpixel crop-mapping method.

Figure 9. Comparison of cropland distribution from CLCD and estimated total crop-planting area on a 1-km pixel scale: (a) the proportion of cropland calculated using 30-m LCLD data and estimated total crop-planting area by (b) RF, (c) RF-C, (d) RF-FS, and (e) RF-FS-C, respectively.

Figure 9. Comparison of cropland distribution from CLCD and estimated total crop-planting area on a 1-km pixel scale: (a) the proportion of cropland calculated using 30-m LCLD data and estimated total crop-planting area by (b) RF, (c) RF-C, (d) RF-FS, and (e) RF-FS-C, respectively.

5. Discussion

5.1. The impact of model scale invariance hypothesis

A prerequisite assumption of this study is that when constructing the RF model, it is assumed that the causal relationship between the FCPA and various prediction factors remains unchanged at the county-level and 1-km grid scales. A similar assumption has been widely used in various downscaling studies (Lloyd et al. Citation2017; Wilby and Dawson Citation2013). With respect to the actual situation, however, the distribution characteristics of the prediction factors on the two scales were inevitably different. When the average value of all grids in a county was used as the predictive factor value corresponding to the county-level scale, it resulted in a significant reduction in the physical distribution threshold for each factor. Because our study area only had 87 counties, this led to insufficient sample representativeness. It is well known that the performance of the RF model depends on the representativeness of the training datasets. Thus, using the model trained on a coarse scale to predict the distribution of subpixel crop-plating areas at the fine scale introduced a certain degree of uncertainty, which may have led to a large estimation error.

5.2. Contribution analysis of feature selection and postprocessing adjustment

Based on the previous description, we determined that the RF model including feature selection combined with postprocessing correction, had good accuracy and reliability for estimating the subpixel crop-planting area. To understand the contribution of feature selection and postprocessing adjustment to the subpixel crop area retrievals, we used a pure RF-based model as the cornerstone, and then calculated the differences between the subpixel crop area retrieved by the RF-based model that included only adjustment (i.e. RF-C), feature selection (i.e. RF-FS), and both and the cornerstone model (i.e. RF-FS-C), as shown in . We concluded that the contribution of feature selection to the estimation accuracy of the six crops was 12.28%, 19.09%, 13.05%, 6.75%, 4.69%, and 13.26%, whereas the postprocessing adjustment contributed 87.72%, 80.91%, 86.95%, 93.25%, 95.31%, and 86.74%, respectively. Thus, the effects of feature selection and postprocessing adjustment were not fixed but rather varied among the different crop types, reflecting variations in the crop-planting structures. Feature selection alleviated the problem of low-value overestimation in pure RF models to a certain extent, whereas adjustment effectively corrected for both low- and high-value underestimation.

Figure 10. The effect of feature selection and postprocessing adjustment. The first to sixth rows of subplots from top to bottom represent six crops of wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively. The first to third columns from left to right represent the effects of (a1∼a6) adjustment, (b1∼b6) feature selection, and (c1∼c6) a combination of the two, respectively.

Figure 10. The effect of feature selection and postprocessing adjustment. The first to sixth rows of subplots from top to bottom represent six crops of wheat, maize, oil-bearing, vegetable, orchards, and other crops, respectively. The first to third columns from left to right represent the effects of (a1∼a6) adjustment, (b1∼b6) feature selection, and (c1∼c6) a combination of the two, respectively.

6. Conclusion

We proposed a new methodology for FCPA mapping based on collaborative medium-resolution geographic grid data and county-level administrative unit agricultural statistical data. First, we executed binary PSO-based feature selection to automatically select the optimal combination of independent variables and to provide valuable insight for understanding variable importance. Second, we constructed an RF regression model to allocate agricultural statistical data at the county level to a grid scale of 1 km × 1 km. Finally, we proposed an IAGA postprocessing correction to reconcile the inconsistency between the statistical data and estimation results. This study considered Gansu Province, one of the regions with the richest crop varieties in China, as the research area and verified the reliability and stability of the method using six main crops (i.e. wheat, maize, oil-bearing, vegetable, orchards, and other crops) as the targets under the circumstances of landscape fragmentation, which have had negative impacts on crop mapping accuracy. These results showed that the subpixel crop distribution maps based on RF regression not only had good consistency with agricultural statistical data in terms of quantity, but also preserved the spatial detail distribution characteristics of geographic data. Compared with the statistical data, the estimated FCPA achieved a high accuracy of R2 above 0.97, RMSE below 1% for all types of crops, and AB values of 2.83%, 1.53%, 2.93%, 4.61%, 5.24%, and 3.27% for six crops. Compared with the subpixel crop area estimation model based solely on RF, the contribution of feature selection to accuracy improvement was 12.28%, 19.09%, 13.05%, 6.75%, 4.69%, and 13.26%, respectively, and the IAGA postprocessing adjustment contributed 87.72%, 80.91%, 86.95%, 93.25%, 95.31%, and 86.74% to accuracy improvement for six crops, respectively. Moreover, the consistency between the cropland areas obtained in this study and the independent CLCD was 0.70.

These promising results suggest that harvesting the complementary characteristics of geographic data and agricultural statistics should be considered when exploring their relationship with subpixel crop acreage. The general approach described in this study can be applied to other geographical regions around the globe with similar availability of data and also can provide new support for the development of a crop map in large regions. Additionally, this approach enriches and develops technical methods for the fusion of geographic and nongeographic data, which can provide new references for the collaborative fusion of multisource data for FCPA mapping. Blending the advantage of two different sources of information may inadvertently introduce new errors because of inconsistencies, especially when agricultural statistical data are not ‘accurate and perfect,’ and these errors directly affect the accuracy of the final estimation results. Additionally, because of the inherent limitations of RF regression methods, our proposed method is not truly a synergistic estimation of multiple crop areas, and future work is needed to explore the benefit of incorporating new methods of ‘multiple input and multiple output.’ In short, the results of this study demonstrate the benefits of integrating machine learning techniques, remote sensing, and agricultural statistics to perform more reliable crop-planting area mapping. The study provides a novel solution that downscales county-level agricultural statistics to 1-km spatial resolution FCPA maps.

Acknowledgments

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data supporting the findings of this study are available from the website provided in the manuscript.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by the Open Research Program of the International Research Center of Big Data for Sustainable Development Goals (grant number CBAS2023ORP04), the Basic Research Innovative Groups of Gansu province, China(grant nomber 21JR7RA068), Natural Science Foundation of Gansu Province, China (grant number 22JR5RA060), and the National Natural Science Foundation of China (grant number 42361060).

References

  • Akbari, E., A. D. Boloorani, N. N. Samany, S. Hamzeh, S. Soufizadeh, and S. Pignatti. 2020. “Crop Mapping Using Random Forest and Particle Swarm Optimization Based on Multi-Temporal Sentinel-2.” Remote Sensing 12 (9): 1449. https://doi.org/10.3390/rs12091449.
  • An, D., Y. H. Du, R. Berndtsson, Z. R. Niu, L. N. Zhang, and F. F. Yuan. 2020. “Evidence of Climate Shift for Temperature and Precipitation Extremes Across Gansu Province in China.” Theoretical and Applied Climatology 139 (3–4): 1137–1149. https://doi.org/10.1007/s00704-019-03041-1.
  • Atzberger, C., and F. Rembold. 2013. “Mapping the spatial distribution of winter crops at sub-pixel level using AVHRR NDVI time series and neural nets.” Remote Sensing 5 (3): 1335–1354. https://doi.org/10.3390/rs5031335.
  • Bargiel, D. 2017. “A new Method for Crop Classification Combining Time Series of Radar Images and Crop Phenology Information.” Remote Sensing of Environment 198: 369–383. https://doi.org/10.1016/j.rse.2017.06.022.
  • Begue, A., D. Arvor, B. Bellon, J. Betbeder, D. de Abelleyra, R. P. D. Ferraz, V. Lebourgeois, C. Lelong, M. Simoes, and S. R. Veron. 2018. “Remote Sensing and Cropping Practices: A Review.” Remote Sensing 10 (1): 32. https://doi.org/10.3390/rs10010099.
  • Breiman, L. 2001. “Random forests.” Machine learning 45: 5–32.
  • Chen, J., P. Jonsson, M. Tamura, Z. H. Gu, B. Matsushita, and L. Eklundh. 2004. “A Simple Method for Reconstructing a High-Quality NDVI Time-Series Data set Based on the Savitzky-Golay Filter.” Remote Sensing of Environment 91 (3–4): 332–344. https://doi.org/10.1016/j.rse.2004.03.014.
  • Chen, Y. L., D. S. Lu, E. Moran, M. Batistella, L. V. Dutra, I. D. Sanches, R. F. B. da Silva, J. F. Huang, A. J. B. Luiz, and M. A. F. de Oliveira. 2018. “Mapping Croplands, Cropping Patterns, and Crop Types Using MODIS Time-Series Data.” International Journal of Applied Earth Observation and Geoinformation 69: 133–147. https://doi.org/10.1016/j.jag.2018.03.005.
  • Clauss, K., H. M. Yan, and C. Kuenzer. 2016. “Mapping Paddy Rice in China in 2002, 2005, 2010 and 2014 with MODIS Time Series.” Remote Sensing 8 (5): 22. https://doi.org/10.3390/rs8050434.
  • Dong, J., X. Xiao, W. Kou, Y. Qin, G. Zhang, L. Li, C. Jin, et al. 2015. “Tracking the Dynamics of Paddy Rice Planting Area in 1986-2010 Through Time Series Landsat Images and Phenology-Based Algorithms.” Remote Sensing of Environment 160: 99–113. https://doi.org/10.1016/j.rse.2015.01.004.
  • Foerster, S., K. Kaden, M. Foerster, and S. Itzerott. 2012. “Crop Type Mapping Using Spectral-Temporal Profiles and Phenological Information.” Computers and Electronics in Agriculture 89: 30–40. https://doi.org/10.1016/j.compag.2012.07.015.
  • Foley, J. A., R. DeFries, G. P. Asner, C. Barford, G. Bonan, S. R. Carpenter, F. S. Chapin, et al. 2005. “Global Consequences of Land use.” Science 309 (5734): 570–574. https://doi.org/10.1126/science.1111772.
  • Foley, J. A., N. Ramankutty, K. A. Brauman, E. S. Cassidy, J. S. Gerber, M. Johnston, N. D. Mueller, et al. 2011. “Solutions for a Cultivated Planet.” Nature 478 (7369): 337–342. https://doi.org/10.1038/nature10452.
  • Fritz, S., L. See, I. McCallum, L. Z. You, A. Bun, E. Moltchanova, M. Duerauer, et al. 2015. “Mapping Global Cropland and Field Size.” Global Change Biology 21 (5): 1980–1992. https://doi.org/10.1111/gcb.12838.
  • Frolking, S., J. J. Qiu, S. Boles, X. M. Xiao, J. Y. Liu, Y. H. Zhuang, C. S. Li, and X. G. Qin. 2002. “Combining Remote Sensing and Ground Census Data to Develop new Maps of the Distribution of Rice Agriculture in China.” Global Biogeochemical Cycles 16 (4): 10. https://doi.org/10.1029/2001gb001425.
  • Gilbertson, J. K., and A. van Niekerk. 2017. “Value of Dimensionality Reduction for Crop Differentiation with Multi-Temporal Imagery and Machine Learning.” Computers and Electronics in Agriculture 142: 50–58. https://doi.org/10.1016/j.compag.2017.08.024.
  • Gumma, M. K., P. S. Thenkabail, A. Maunahan, S. Islam, and A. Nelson. 2014. “Mapping Seasonal Rice Cropland Extent and Area in the High Cropping Intensity Environment of Bangladesh Using MODIS 500 m Data for the Year 2010.” Isprs Journal of Photogrammetry and Remote Sensing 91: 98–113. https://doi.org/10.1016/j.isprsjprs.2014.02.007.
  • Hartfield, K. A., S. E. Marsh, C. D. Kirk, and Y. Carriere. 2013. “Contemporary and Historical Classification of Crop Types in Arizona.” International Journal of Remote Sensing 34 (17): 6024–6036. https://doi.org/10.1080/01431161.2013.793861.
  • He, D., Q. Shi, X. P. Liu, Y. F. Zhong, G. S. Xia, and L. P. Zhang. 2022. “Generating Annual High Resolution Land Cover Products for 28 Metropolises in China Based on a Deep Super-Resolution Mapping Network Using Landsat Imagery.” Giscience & Remote Sensing 59 (1): 2036–2067. https://doi.org/10.1080/15481603.2022.2142727.
  • Hu, Q., W. B. Wu, Q. Song, Q. Y. Yu, M. Lu, P. Yang, H. J. Tang, and Y. Q. Long. 2016. “Extending the Pairwise Separability Index for Multicrop Identification Using Time-Series MODIS Images.” Ieee Transactions on Geoscience and Remote Sensing 54 (11): 6349–6361. https://doi.org/10.1109/tgrs.2016.2581210.
  • Hu, Q., H. Yin, M. A. Friedl, L. Z. You, Z. L. Li, H. J. Tang, and W. B. Wu. 2021. “Integrating Coarse-Resolution Images and Agricultural Statistics to Generate sub-Pixel Crop Type Maps and Reconciled Area Estimates.” Remote Sensing of Environment 258: 14. https://doi.org/10.1016/j.rse.2021.112365.
  • Jonsson, P., and L. Eklundh. 2004. “TIMESAT - a Program for Analyzing Time-Series of Satellite Sensor Data.” Computers & Geosciences 30 (8): 833–845. https://doi.org/10.1016/j.cageo.2004.05.006.
  • Kennedy, J., and R. Eberhart. 1995. “Particle Swarm Optimization.” Proceedings of ICNN'95-International Conference on Neural Networks 4. https://doi.org/10.1109/ICNN.1995.488968.
  • Lark, T. J., I. H. Schelly, and H. K. Gibbs. 2021. “Accuracy, Bias, and Improvements in Mapping Crops and Cropland Across the United States Using the USDA Cropland Data Layer.” Remote Sensing 13 (5): 29. https://doi.org/10.3390/rs13050968.
  • Li, X., M. Feng, Y. H. Ran, Y. Su, F. Liu, C. L. Huang, H. F. Shen, et al. 2023. “Big Data in Earth System Science and Progress Towards a Digital Twin.” Nature Reviews Earth & Environment.14, https://doi.org/10.1038/s43017-023-00409-w.
  • Li, X. H., J. L. Hou, and C. L. Huang. 2021. “High-Resolution Gridded Livestock Projection for Western China Based on Machine Learning.” Remote Sensing 13 (24): 21. https://doi.org/10.3390/rs13245038.
  • Liu, W., J. Dong, K. L. Xiang, S. Wang, W. Han, and W. P. Yuan. 2018. “A sub-Pixel Method for Estimating Planting Fraction of Paddy Rice in Northeast China.” Remote Sensing of Environment 205: 305–314. https://doi.org/10.1016/j.rse.2017.12.001.
  • Liu, J. Y., W. H. Kuang, Z. X. Zhang, X. L. Xu, Y. W. Qin, J. Ning, W. C. Zhou, et al. 2014. “Spatiotemporal Characteristics, Patterns, and Causes of Land-use Changes in China Since the Late 1980s.” Journal of Geographical Sciences 24 (2): 195–210. https://doi.org/10.1007/s11442-014-1082-6.
  • Lloyd, C. T., A. Sorichetta, and A. J. Tatem. 2017. “High resolution global gridded data for use in population studies.” Scientific data 4 (1): 1–17. https://doi.org/10.1038/sdata.2017.1.
  • Lobell, D. B., and G. P. Asner. 2004. “Cropland Distributions from Temporal Unmixing of MODIS Data.” Remote Sensing of Environment 93 (3): 412–422. https://doi.org/10.1016/j.rse.2004.08.002.
  • Luo, Y. C., Z. Zhang, Z. Y. Li, Y. Chen, L. L. Zhang, J. Cao, and F. L. Tao. 2020. “Identifying the Spatiotemporal Changes of Annual Harvesting Areas for Three Staple Crops in China by Integrating Multi-Data Sources.” Environmental Research Letters 15 (7): 15. https://doi.org/10.1088/1748-9326/ab80f0.
  • Masoudi-Sobhanzadeh, Y., H. Motieghader, and A. Masoudi-Nejad. 2019. “FeatureSelect: A Software for Feature Selection Based on Machine Learning Approaches.” Bmc Bioinformatics 20: 17. https://doi.org/10.1186/s12859-019-2754-0.
  • Massey, R., T. T. Sankey, R. G. Congalton, K. Yadav, P. S. Thenkabail, M. Ozdogan, and A. J. S. Meador. 2017. “MODIS Phenology-Derived, Multi-Year Distribution of Conterminous US Crop Types.” Remote Sensing of Environment 198: 490–503. https://doi.org/10.1016/j.rse.2017.06.033.
  • Molotoks, A., E. Stehfest, J. Doelman, F. Albanito, N. Fitton, T. P. Dawson, and P. Smith. 2018. “Global Projections of Future Cropland Expansion to 2050 and Direct Impacts on Biodiversity and Carbon Storage.” Global Change Biology 24 (12): 5895–5908. https://doi.org/10.1111/gcb.14459.
  • Monfreda, C., N. Ramankutty, and J. A. Foley. 2008. “Farming the Planet: 2. Geographic Distribution of Crop Areas, Yields, Physiological Types, and net Primary Production in the Year 2000.” Global Biogeochemical Cycles 22 (1): 19. https://doi.org/10.1029/2007gb002947.
  • Nicolas, G., T. P. Robinson, G. R. W. Wint, G. Conchedda, G. Cinardi, and M. Gilbert. 2016. “Using Random Forest to Improve the Downscaling of Global Livestock Census Data.” Plos One 11 (3): 16. https://doi.org/10.1371/journal.pone.0150424.
  • Pan, Y. Z., L. Li, J. S. Zhang, S. L. Liang, X. F. Zhu, and D. Sulla-Menashe. 2012. “Winter Wheat Area Estimation from MODIS-EVI Time Series Data Using the Crop Proportion Phenology Index.” Remote Sensing of Environment 119: 232–242. https://doi.org/10.1016/j.rse.2011.10.011.
  • Pena-Barragan, J. M., M. K. Ngugi, R. E. Plant, and J. Six. 2011. “Object-based Crop Identification Using Multiple Vegetation Indices, Textural Features and Crop Phenology.” Remote Sensing of Environment 115 (6): 1301–1316. https://doi.org/10.1016/j.rse.2011.01.009.
  • Pittman, K., M. C. Hansen, I. Becker-Reshef, P. V. Potapov, and C. O. Justice. 2010. “Estimating Global Cropland Extent with Multi-Year MODIS Data.” Remote Sensing 2 (7): 1844–1863. https://doi.org/10.3390/rs2071844.
  • Ramankutty, N., A. T. Evan, C. Monfreda, and J. A. Foley. 2008. “Farming the Planet: 1. Geographic Distribution of Global Agricultural Lands in the Year 2000.” Global Biogeochemical Cycles 22 (1): 19. https://doi.org/10.1029/2007gb002952.
  • Sakamoto, T., B. D. Wardlow, and A. A. Gitelson. 2011. “Detecting Spatiotemporal Changes of Corn Developmental Stages in the US Corn Belt Using MODIS WDRVI Data.” Ieee Transactions on Geoscience and Remote Sensing 49 (6): 1926–1936. https://doi.org/10.1109/tgrs.2010.2095462.
  • Salmon, J. M., M. A. Friedl, S. Frolking, D. Wisser, and E. M. Douglas. 2015. “Global Rain-fed, Irrigated, and Paddy Croplands: A new High Resolution map Derived from Remote Sensing, Crop Inventories and Climate Data.” International Journal of Applied Earth Observation and Geoinformation 38: 321–334. https://doi.org/10.1016/j.jag.2015.01.014.
  • See, L. D., S. Fritz, L. Z. You, N. Ramankutty, M. Herrero, C. Justice, I. Becker-Reshef, et al. 2015. “Improved Global Cropland Data as an Essential Ingredient for Food Security.” Global Food Security-Agriculture Policy Economics and Environment 4: 37–45. https://doi.org/10.1016/j.gfs.2014.10.004.
  • Skakun, S., B. Franch, E. Vermote, J. C. Roger, I. Becker-Reshef, C. Justice, and N. Kussul. 2017. “Early Season Large-Area Winter Crop Mapping Using MODIS NDVI Data, Growing Degree Days Information and a Gaussian Mixture Model.” Remote Sensing of Environment 195: 244–258. https://doi.org/10.1016/j.rse.2017.04.026.
  • Song, X. P., P. V. Potapov, A. Krylov, L. King, C. M. Di Bella, A. Hudson, A. Khan, B. Adusei, S. V. Stehman, and M. C. Hansen. 2017. “National-scale Soybean Mapping and Area Estimation in the United States Using Medium Resolution Satellite Imagery and Field Survey.” Remote Sensing of Environment 190: 383–395. https://doi.org/10.1016/j.rse.2017.01.008.
  • Tang, G. A. 2019. “Digital elevation model of China (1KM).” https://www.tpdc.ac.cn/zh-hans/data/12e91073-0181-44bf-8308-c50e5bd9a734/.
  • Thenkabail, P. S., J. W. Knox, M. Ozdogan, M. K. Gumma, R. G. Congalton, Z. T. Wu, C. Milesi, et al. 2012. “Assessing Future Risks to Agricultural Productivity, Water Resources and Food Security: How Can Remote Sensing Help?” Photogrammetric Engineering and Remote Sensing 78 (8): 773–782.
  • Thenkabail, P. S., and Z. T. Wu. 2012. “An Automated Cropland Classification Algorithm (ACCA) for Tajikistan by Combining Landsat, MODIS, and Secondary Data.” Remote Sensing 4 (10): 2890–2918. https://doi.org/10.3390/rs4102890.
  • UNEP-WCMC. 2022. “Protected Area Profile for China from the World Database on Protected Areas.” https://www.protectedplanet.net.
  • Van Niel, T. G., and T. R. McVicar. 2004. “Determining Temporal Windows for Crop Discrimination with Remote Sensing: A Case Study in South-Eastern Australia.” Computers and Electronics in Agriculture 45 (1–3): 91–108. https://doi.org/10.1016/j.compag.2004.06.003.
  • Wang, P., L. G. Wang, H. Leung, and G. Zhang. 2020. “Super-Resolution Mapping Based on Spatial-Spectral Correlation for Spectral Imagery.” Ieee Transactions on Geoscience and Remote Sensing 59 (3): 2256–2268. https://doi.org/10.1109/tgrs.2020.3004353.
  • Wang, Y., Q. Zhang, S. P. Wang, J. S. Wang, and Y. B. Yao. 2017. “Characteristics of Agro-Meteorological Disasters and Their Risk in Gansu Province Against the Background of Climate Change.” Natural Hazards 89 (2): 899–921. https://doi.org/10.1007/s11069-017-2999-8.
  • Wardlow, B. D., and S. L. Egbert. 2008. “Large-area Crop Mapping Using Time-Series MODIS 250 m NDVI Data: An Assessment for the US Central Great Plains.” Remote Sensing of Environment 112 (3): 1096–1116. https://doi.org/10.1016/j.rse.2007.07.019.
  • Wilby, R. L., and C. W. Dawson. 2013. “The Statistical DownScaling Model: Insights from one Decade of Application.” International Journal of Climatology 33 (7): 1707–1719. https://doi.org/10.1002/joc.3544.
  • Wu, B. F., and Q. Z. Li. 2012. “Crop Planting and Type Proportion Method for Crop Acreage Estimation of Complex Agricultural Landscapes.” International Journal of Applied Earth Observation and Geoinformation 16: 101–112. https://doi.org/10.1016/j.jag.2011.12.006.
  • Wu, B. F., J. H. Meng, Q. Z. Li, N. N. Yan, X. Du, and M. Zhang. 2014. “Remote Sensing-Based Global Crop Monitoring: Experiences with China's CropWatch System.” International Journal of Digital Earth 7 (2): 113–137. https://doi.org/10.1080/17538947.2013.821185.
  • Xiao, X. M., S. Boles, J. Y. Liu, D. F. Zhuang, S. Frolking, C. S. Li, W. Salas, and B. Moore. 2005. “Mapping Paddy Rice Agriculture in Southern China Using Multi-Temporal MODIS Images.” Remote Sensing of Environment 95 (4): 480–492. https://doi.org/10.1016/j.rse.2004.12.009.
  • Xiong, J., P. S. Thenkabail, M. K. Gumma, P. Teluguntla, J. Poehnelt, R. G. Congalton, K. Yadav, and D. Thau. 2017. “Automated Cropland Mapping of Continental Africa Using Google Earth Engine Cloud Computing.” Isprs Journal of Photogrammetry and Remote Sensing 126: 225–244. https://doi.org/10.1016/j.isprsjprs.2017.01.019.
  • Xu, Y., Q. Du, and N. H. Younan. 2017. “Particle Swarm Optimization-Based Band Selection for Hyperspectral Target Detection.” Ieee Geoscience and Remote Sensing Letters 14 (4): 554–558. https://doi.org/10.1109/lgrs.2017.2658666.
  • Xu, X. L., J. Y. Liu, S. W. Zhang, R. D. Li, C. Z. Yan, and S. X. Wu. 2018. “Multi-Period Land Use and Land Cover Remote Sensing Monitoring Data Set in China.” https://www.resdc.cn/DOI/DOI.aspx?DOIID=54.
  • Xue, B., M. J. Zhang, and W. N. Browne. 2012. “Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach.” Ieee Transactions on Cybernetics 43 (6): 1656–1671. https://doi.org/10.1109/tsmcb.2012.2227469.
  • Xue, B., M. J. Zhang, and W. N. BrowneSchool. 2014. “Particle Swarm Optimisation for Feature Selection in Classification: Novel Initialisation and Updating Mechanisms.” Applied Soft Computing 18: 261–276. https://doi.org/10.1016/j.asoc.2013.09.018.
  • Yang, J., and X. Huang. 2021. “The 30 m Annual Land Cover Dataset and Its Dynamics in China from 1990 to 2019.” Earth System Science Data 13 (8): 3907–3925. https://doi.org/10.5194/essd-13-3907-2021.
  • Yin, L. K., N. S. You, G. L. Zhang, J. X. Huang, and J. W. Dong. 2020. “Optimizing Feature Selection of Individual Crop Types for Improved Crop Mapping.” Remote Sensing 12 (1): 20. https://doi.org/10.3390/rs12010162.
  • You, L. Z., and S. Wood. 2006. “An Entropy Approach to Spatial Disaggregation of Agricultural Production.” Agricultural Systems 90 (1-3): 329–347. https://doi.org/10.1016/j.agsy.2006.01.008.
  • You, L. Z., S. Wood, U. Wood-Sichra, and W. B. Wu. 2014. “Generating Global Crop Distribution Maps: From Census to Grid.” Agricultural Systems 127: 53–60. https://doi.org/10.1016/j.agsy.2014.01.002.
  • Zhang, J. H., L. L. Feng, and F. M. Yao. 2014. “Improved Maize Cultivated Area Estimation Over a Large Scale Combining MODIS-EVI Time Series Data and Crop Phenological Information.” Isprs Journal of Photogrammetry and Remote Sensing 94: 102–113. https://doi.org/10.1016/j.isprsjprs.2014.04.023.
  • Zhang, G. L., X. M. Xiao, J. W. Dong, W. L. Kou, C. Jin, Y. W. Qin, Y. T. Zhou, J. Wang, M. A. Menarguez, and C. Biradar. 2015. “Mapping Paddy Rice Planting Areas Through Time Series Analysis of MODIS Land Surface Temperature and Vegetation Index Data.” Isprs Journal of Photogrammetry and Remote Sensing 106: 157–171. https://doi.org/10.1016/j.isprsjprs.2015.05.011.
  • Zhong, L. H., P. Gong, and G. S. Biging. 2014. “Efficient Corn and Soybean Mapping with Temporal Extendability: A Multi-Year Experiment Using Landsat Imagery.” Remote Sensing of Environment 140: 1–13. https://doi.org/10.1016/j.rse.2013.08.023.
  • Zhong, L. H., L. N. Hu, L. Yu, P. Gong, and G. S. Biging. 2016. “Automated Mapping of Soybean and Corn Using Phenology.” Isprs Journal of Photogrammetry and Remote Sensing 119: 151–164. https://doi.org/10.1016/j.isprsjprs.2016.05.014.
  • Zhong, L. H., L. Hu, H. Zhou, and X. Tao. 2019. “Deep Learning Based Winter Wheat Mapping Using Statistical Data as Ground References in Kansas and Northern Texas, US.” Remote Sensing of Environment 233: 17. https://doi.org/10.1016/j.rse.2019.111411.
  • Zhu, C. M., X. Zhang, and Q. H. Huang. 2019. “Mapping Fractional Cropland Covers in Brazil Through Integrating LSMA and SDI Techniques Applied to MODIS Imagery.” International Journal of Agricultural and Biological Engineering 12 (1): 192–200. https://doi.org/10.25165/j.ijabe.20191201.4419.