2,762
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Thematic accuracy assessment of the NLCD 2019 land cover for the conterminous United States

, , , &
Article: 2181143 | Received 15 Sep 2022, Accepted 10 Feb 2023, Published online: 01 Mar 2023

ABSTRACT

The National Land Cover Database (NLCD), a product suite produced through the MultiResolution Land Characteristics (MRLC) consortium, is an operational land cover monitoring program. Starting from a base year of 2001, NLCD releases a land cover database every 2–3-years. The recent release of NLCD2019 extends the database to 18 years. We implemented a stratified random sample to collect land cover reference data for the 2016 and 2019 components of the NLCD2019 database at Level II and Level I of the classification hierarchy. For both dates, Level II land cover overall accuracies (OA) were 77.5% ± 1% (± value is the standard error) when agreement was defined as a match between the map label and primary reference label only, and increased to 87.1% ± 0.7% when agreement was defined as a match between the map label and either the primary or alternate reference label. At Level I of the classification hierarchy, land cover OA was 83.1% ± 0.9% for both 2016 and 2019 when agreement was defined as a match between the map label and primary reference label only, and increased to 90.3% ± 0.7% when agreement also included the alternate reference label. The Level II and Level I OA for the 2016 land cover in the NLCD2019 database were 5% higher compared to the 2016 land cover component of the NLCD2016 database when agreement was defined as a match between the map label and primary reference label only. No improvement was realized by the NLCD2019 database when agreement also included the alternate reference label. User’s accuracies (UA) for forest loss and grass gain were>70% when agreement included either the primary or alternate label, and UA was generally<50% for all other change themes. Producer’s accuracies (PA) were>70% for grass loss and gain and water gain and generally<50% for the other change themes. We conducted a post-analysis review for map-reference agreement to identify patterns of disagreement, and these findings are discussed in the context of potential adjustments to mapping and reference data collection procedures that may lead to improved map accuracy going forward.

1. Introduction

The United States National Land Cover Database (NLCD) (https://www.mrlc.gov) was initiated in the late 1990s (Loveland and Shaw Citation1996) to provide land cover for the nation from the Landsat satellites at their native spatial resolution of 30 x 30 m. The initiative has since grown into a land cover monitoring program that produces time-integrated (land cover at time t informs land cover at time t-1) land cover at 2- to 3-year intervals (Homer et al. Citation2020; Yang et al. Citation2018). The latest release, NLCD2019, includes land cover for 2001, 2004, 2006, 2008, 2011, 2013, 2016, and 2019. NLCD is recognized as a national geospatial data asset by the U.S. Federal Geographic Data Committee (FGDC Citation2021)

NLCD map production methodology is described thoroughly elsewhere (Homer et al. Citation2020; Jin et al. Citation2019; Yang et al. Citation2018). NLCD uses spectral (e.g. Landsat bands, indices), spatial (image objects), temporal (spectral succession and trajectory), and ancillary data as input into its land cover classification and post-classification processes. Random forest models are used for classification. Landsat 8 (launched 11 February 2013) images were used for the 2013, 2016, and 2019 dates of land cover in NLCD2019 and Landsat 5 was used for all earlier dates. Two of the main enhancements adopted for NLCD2019 production that were not incorporated in previous releases of NLCD were use of surface rather than top-of-the-atmosphere Landsat reflectance data (USGS Citation2021; Citation2022) and image composites. Use of surface rather than top-of-the-atmosphere reflectance corrects for atmospheric effects (Wulder et al. Citation2019). Image compositing follows from the widely utilized maximum Normalized Difference Vegetation Index (NDVI) methodology (Roy et al. Citation2010 and citations therein) first reported by Holben (Citation1986). Compositing compares a suite of images over a given time period (e.g. season) and selects the “best” pixel on a per-pixel basis (Jin et al. Citationin press; Qui et al. Citation2023). The method is based on the logic that the highest value is least corrupted by atmospheric effects and other factors and is also an efficient data reduction strategy (Holben Citation1986; Roy et al. Citation2010). NLCD image compositing (Jin et al. Citationin press) was based on a median value concept (Flood Citation2013). This method provided superior change detection results when compared to a suite of image compositing methods (Qui et al. Citation2023). Image compositing was used to create leaf-on (May 1 to September 30) and leaf-off (November 1 to March 1) images for each date of land cover (Jin et al. Citationin press).

Formal, statistically rigorous accuracy assessment is a prescribed component of NLCD because of its overarching monitoring objective (Homer et al. Citation2020; Yang et al. Citation2018) and its status as an operational program (Hansen and Loveland Citation2012). Accuracy assessment has followed all previous releases of NLCD (Stehman et al. Citation2003; Wickham et al. Citation2021, Citation2017, Citation2013, Citation2010). NLCD land cover data are used widely and therefore accuracy assessment is needed to support informed use of the data (Stehman and Czaplewski Citation1998). For example, NLCD is used to derive annual U.S. greenhouse gas emission estimates reported to the United Nations Framework Convention on Climate Change (UNFCCC) (EPA Citation2022). Accuracy assessment results also are used to improve the classification methods used for subsequent releases of NLCD data (Homer et al. Citation2020).

To denote the different versions of NLCD (based on date of release) and the year of land cover mapped, we use “NLCDxxxx” “yyyy” as terminology to describe the NLCD release date (“xxxx”) and the nominal date of land cover (“yyyy”). For example, NLCD2019 2013 would refer to the 2013 land cover in the NLCD2019 database. This terminology is identical to that used to report accuracies for NLCD2016 (Wickham et al. Citation2021). The release date (e.g. NLCD2019) refers to the nominal date of image acquisition, not the year the data were posted to the NLCD website (https://www.mrlc.gov). Each release of NLCD data includes updated versions of previous releases. NLCD2019 includes updated versions of each land cover date (2001, 2004, 2006, 2008, 2011, 2013, and 2016) released as part of NLCD2016. Prior accuracy assessments have focused on land cover released at a 5-year cycle; accuracy assessment of NLCD2016 focused on the nominal dates of 2001, 2006, 2011, and 2016. Here, we focus on the accuracy of the nominal dates of 2016 and 2019, reporting single-date and 2016–2019 change results for the NLCD2019 of the conterminous United States (i.e. United States, excluding Alaska and Hawaii). This is the first formal assessment of an NLCD interstitial date product and consequently provides an evaluation of accuracy of a 3-year change period versus the 5-year change periods of previous assessments.

2. Methods

The accuracy assessment methodology for NLCD2019 was nearly identical to that used for NLCD2016 (Wickham et al. Citation2021). The main differences were: 1) a reduction in the sample size to 3245 from 4629 used for NLCD2016, 2) inclusion of an additional stratum, and 3) a post-analysis review (Section 2.3) of factors that may have contributed to disagreement between map and reference labels. The reduction of the sample size, necessitated by programmatic changes, will reduce the precision of the accuracy estimators relative to the NLCD2016 assessment but the accuracy estimators will remain unbiased. However, because the standard errors of the accuracy estimates are directly related to the sample size and not the percent of the population sampled (Stehman Citation2001, 730), the sample size of 3245 is still large enough to yield acceptably small standard errors. The additional stratum was included to increase the sample size for changes from shrubland to grassland and vice versa. Grassland and shrubland comprise about 25% of the conterminous United States and change among these classes arises from ecological disturbances (e.g. fire), timber harvest and regeneration activities, and other agents of change. The post-analysis review was a systematic assessment of common error patterns which had not been undertaken in previous NLCD accuracy assessments. The main objectives of the post-analysis review were to inform future map production and future reference data collection.

2.1. Sampling design

Sample selection was based on 22 strata, defined by the NLCD2019 product, that included 15 no-change strata and 7 change strata. The 15 no-change strata were all NLCD Level II land cover classes () except perennial ice and snow (class = 12), which was not included because it comprised 1% of the map (i.e. population). The stratification and sample size allocation were implemented primarily for the objective of estimating user’s accuracy of the rare land cover and change types. That is, the strata allowed us to increase the sample size for these rare classes to reduce the standard errors of the user’s accuracy estimates. The 7 change strata were:

  1. Urban gain: not urban in 2016 and urban in 2019;

  2. Forest loss: forest in 2016 and not forest in 2019, except for forest to urban (in urban gain stratum);

  3. Forest gain: not forest in 2016 and forest in 2019;

  4. Agriculture loss: agriculture in 2016 and not agriculture in 2019, except for agriculture to urban (in urban gain stratum) and agriculture to forest (in forest gain stratum);

  5. Agriculture gain: not agriculture in 2016 and agriculture in 2019 except for forest to agriculture (in forest loss stratum);

  6. Shrubland – grassland flux: shrubland in 2016 and grassland in 2019 or vice versa;

  7. Catch-all – all 2016 to 2019 land cover changes not included in the other 6 strata. This stratum includes land cover changes at Level II that are not changes at Level I (e.g. pasture to cropland).

Table 1. NLCD 2019 land cover legend (see, USGS Citationn.d., in citations for a more detailed description).

This sequence of steps for assigning each pixel to a stratum was necessary to guarantee that a pixel was assigned to one and only one stratum, as required of stratified sampling.

The 22 strata were created by combining the NLCD2019 2016 and 2019 land cover in a Geographic Information System (GIS) and assigning each pixel to the stratum to which it belonged (). The strata map was then disaggregated into individual (i.e. binary) maps which were used as a mask to generate a floating-point (double precision) random number map specific to each stratum. Pixels with random number values less than or equal to the ratio n/N, where n was the target sample size and N was the number of pixels in the stratum were selected for reference data collection. Like previous NLCD accuracy assessments, the strata map excluded coastal waters and the Great Lakes to avoid the likely reality that nearly all sample pixels in the water stratum would coincide with these water features. Nearly 4% of the Land Change Monitoring, Assessment, and Projection (LCMAP) program’s (Brown et al. Citation2020) initial sample locations (Pengra et al. Citation2020), for example, occurred in coastal waters or the Great Lakes because these features were not masked prior to sample selection.

Table 2. Strata for the sampling design of the NLCD2019 accuracy assessment. Except for stratum 16, the change strata are resolved at Level I of the classification hierarchy (see ). The Level II perennial ice and snow land cover class and land cover changes at Level II of the classification hierarchy but not at Level I were included in stratum 16. The area of the population (map) is 7,789,691 km2 (8,665,213,260 pixels).

2.2. Response design

Reference data collection followed the protocols established for previous NLCD accuracy assessments (Wickham et al. Citation2021). The main protocols were: 1) reference data collection by experienced interpreters (Bianchetti and MacEachren Citation2015); 2) blind interpretation (i.e. assignment of reference land cover labels without knowledge of the map label); 3) reliance on Google Earth™ imagery to address the accuracy assessment principle that the reference medium should be of higher quality than the mapping medium (Olofsson et al. Citation2014; Stehman and Foody Citation2019); 4) defining the pixel as the spatial support unit (Stehman and Wickham Citation2011); 5) reliance on image context (i.e. the neighborhood around the pixel) as a good practice to ensure correct reference label assignment (Stehman and Czaplewski Citation1998), and 6) use of primary and alternate reference labels to account for “fuzziness” of reference class determinations. Reference data labels for all 3,245 sample pixels were assigned by NLCD’s most experienced interpreter (protocol 1) using a blind format (protocol 2). The interpreter had participated in all NLCD accuracy assessments going back to NLCD2001 (Wickham et al. Citation2010). Reference label assignments were further supported by weekly web-enabled conference calls to discuss the most appropriate labels for a subset of the sample pixels. These pixels were selected because they represented difficult cases to classify. Nearly all coauthors participated in the weekly calls. Additional information on response design protocols is documented in the supplemental section of NLCD2016 accuracy assessment (Wickham et al. Citation2021)

Reference label assignment was based on interpretation of Google Earth™ imagery (protocol 3) for each selected sample pixel (protocol 4) by overlaying Keyhole Markup language Zipped (KMZ) files of points representing the center of the sample pixel, the polygon boundary of the 30 x 30 m sample pixel, and a 3-x-3 lattice of cells surrounding (and including) the sample pixel (protocol 5). The Google Earth™ platform includes an extensive collection of temporal, high-resolution imagery through agreements with several vendors. The image source and acquisition date are displayed in compliance with Google Earth™ protocols (Google Earth™ Citationn.d.). Accompanying KMZ files have been made available (see Data Availability Statement) for those interested in viewing the reference data collected for this assessment.

Primary and alternate reference labels (protocol 6) have been used in all previous NLCD accuracy assessments (Stehman et al. Citation2003; Wickham et al. Citation2010, Citation2013, Citation2017, Citation2021) to address inherent class ambiguity (e.g. pasture versus grassland), geolocation errors , and mixed pixels. Use of two (primary and alternate) reference labels can be considered a special case of linguistic scale, fuzzy membership analysis (Stehman et al. Citation2003, 513) promoted by Gopal and Woodcock (Citation1994). Consistent with accuracy assessment reporting for NLCD2011 (Wickham et al. Citation2017) and NLCD2016 (Wickham et al. Citation2021) error matrices are based on agreement between the map label and primary reference label only. User’s accuracy (UA), producer’s accuracy (PA), and overall accuracy (OA) are estimated from these error matrices. We also provide these accuracy estimates for the definition of agreement that allows a match with either the primary or alternate reference label. Each error matrix therefore includes two expressions of UA, PA, and OA (one for each definition of agreement). Alternate reference labels were used for 43% of the sample pixels at Level II and 28% of the sample pixels at Level I. For comparison, in the NLCD2016 accuracy assessment (Wickham et al. Citation2021) alternate reference labels were used for 63% of the sample pixels at Level II and 39% of the pixels at Level I.

For change accuracy, the definition of agreement when using the alternate reference labels did not include mixing of primary and alternate reference labels across dates. That is, agreement could occur if the primary reference labels for both 2016 and 2019 matched the map labels in 2016 and 2019, or if the alternate reference labels for both 2016 and 2019 matched map labels in 2016 and 2019. For example, suppose the NLCD2019 map labels were 50 (shrubland) for 2016 and 70 (grassland) for 2019, the primary reference labels were 50 for both 2016 and 2019, and the alternate reference labels were 50 for 2016 and 70 for 2019. For this example, there would be no agreement between map and reference labels when agreement was based on the primary reference label only, but there would be agreement when it was defined as a match between the map and either the primary or alternate reference label. Changing the example slightly, now suppose the NLCD2019 map labels were the same as defined previously (class 50 for 2016 and class 70 for 2019), but the primary reference labels were now 50 for both 2016 and 2019, and the alternate reference labels were 70 for both 2016 and 2019. For this example, there would not be agreement between the map and reference labels because neither the 2016–2019 set of primary labels (class 50 in 2016 and class 50 in 2019) nor the 2016–2019 set of alternate reference labels (70 and 70) matched the 2016–2019 set of map labels (50 and 70, respectively). For the assessment of binary change/no change accuracy, the first example would be considered a match only when agreement included that alternate reference label and the latter example would be considered disagreement regardless of agreement definition.

There was no available ca. 2016 Google Earth™ imagery for 164 sample pixels and nearly 43% (1385) of the sample pixels had no ca. 2019 Google Earth™ imagery. Only 56% (1814) of the sample pixels had Google Earth™ imagery for both 2016 and 2019; 118 sample pixels had no Google Earth™ imagery for either date. Landsat color composites were used as the reference medium when Google Earth™ imagery was unavailable. We evaluated the impact of missing Google Earth™ imagery on estimated accuracy for the single-date NLCD2019 2019 land cover and 2016–2019 binary change by comparing estimated accuracy for the two non-overlapping subsets of the sample, one with the Google Earth™ reference imagery and one without the requisite Google Earth™ reference imagery. We treated the two sample subsets as domains (SAS Citation2017, 9175) in the statistical assessment. For the single-date, NLCD2019 2019 comparison, there were 1860 sample pixels in the Google Earth™ imagery available domain and 1385 sample pixels in the Google Earth™ imagery missing domain. For the 2016–2019 comparison, there were 1822 sample pixels in the Google Earth™ imagery available domain and 1423 sample pixels in the Google Earth™ imagery missing domain. We did not assess the effect of missing Google Earth™ imagery on the single-date NLCD2019 2016 land cover because the relatively small percentage (5%) of missing ca. 2016 Google Earth™ imagery resulted in too few sample pixels to make the comparison meaningful.

2.3. Analysis

Analysis followed general estimation theory of probability sampling (cf. Särndal, Swensson, and Wretman Citation1992) that requires the inclusion probabilities derived from the stratified map (Section 2.2) (Stehman Citation2001; Stehman and Czaplewski Citation1998) and an indicator variable formulation of the estimators (Stehman Citation2014). Overall accuracy (OA) was estimated as:

(1) OAˆ=1Nh=1HNhpˆh(1)

where pˆh is the sample proportion of correctly classified pixels in stratum h, N is the total number of sample pixels, Nh is the population size of stratum h, and the summation is over all H strata (H = 22). User’s accuracy (UA) and producer’s accuracy (PA) were expressed as a ratio, R = Y/X, where Y is the population total of yu and X is the population total of xu, and yu and xu are indicator variables. For both user’s and producer’s accuracy of class K, yu = 1 when the map and reference labels are both class K, and yu = 0 otherwise. For user’s accuracy, xu = 1 if sample pixel u is map class K, and xu = 0 otherwise. For producer’s accuracy, xu = 1 if sample pixel u is reference class K, and xu = 0 otherwise. The combined ratio estimator (Cochran Citation1977, Section 6.11) for UA and PA is then:

(2) Rˆ=YˆXˆ=h=1HNhyˉhh=1HNhxˉh(2)

where xˉh is the sample mean (proportion) of xu in stratum h and yˉh is the sample mean of yu in stratum h. The estimated standard error of the combined ratio estimator is:

(3) SERˆ=1Xˆ2h=1HNh21nhNhsyh2+Rˆ2sxh22Rˆsxyh/nh(3)

where nh is the sample size in stratum h, syh2and sxh2 are the sample variances for yu and xu in stratum h, and sxyhis the sample covariance for yu and xu in stratum h, and

(4) Xˆ=h=1HNhxˉh(4)

is the estimated population total of xu. Estimates were computed using SAS (version 9.4) (SAS Institute Inc Citation2017).

Error matrices, with the reference labels forming the columns and the map labels forming the rows, were used to report results. Cell entries are the estimated percent of area for the conterminous United States (i.e. map area) for each realized combination of map and reference label. Multiplying the area area of the conterminous US (7,789,691 km2 from ), by a cell proportion estimates the area labeled as map class = A and reference class = B. UA and PA for both definitions of agreement – map label matches primary reference label only and map label matches either primary or alternate reference label – are reported with the error matrices. Estimates of map bias, calculated as map class area proportion minus reference class area proportion (Olofsson et al. Citation2014) are also reported for each error matrix.

2.4. Post-analysis review

The post-analysis review was a reexamination of reference and map labels following completion of accuracy estimation. The purpose of the review was to uncover error patterns that could be used to inform future NLCD land cover products and accuracy assessments. The results from the post-analysis review were not used in any of the accuracy results reported herein. The review, conducted by three of the coauthors, focused on error patterns in most of the change strata and the open urban class (Level II code = 21). The review was a systematic examination of selected sample pixels that had mismatched map and reference labels, supported by Google Earth™ imagery, Landsat color composites and the land cover map itself (). The post-analysis review focused primarily on the land cover change strata to further support the land cover change monitoring objective of NLCD (Homer et al. Citation2020; Yang et al. Citation2018). The Level II open urban class was included because of its consistently high commission error reported in previous NLCD accuracy assessments (e.g. Wickham et al. Citation2021)

Table 3. Strata from which sample pixels were selected for post-analysis review.

3. Results

3.1. Single-date accuracy

At Level II of the classification hierarchy, OA for both NLCD2019 2016 and NLCD2019 2019 were 77.5% ± 1.0% (± value is the standard error) when agreement was based on a match between the map label and primary reference label only and improved to 87.1% ± 0.7% when agreement also included the alternate reference label (). Based on the primary only agreement definition, the OA for NLCD2019 2016 represents a substantial improvement of 5% over that reported for NLCD2016 2016 (Wickham et al. Citation2021). The 5% improvement in OA represents an increase of about 390,000 km2 in area that is accurately mapped in NLCD2019 2016 compared to NLCD2016 2016. The substantial increase in NLCD2019 2016 OA relative to NLCD2016 2016 OA did not hold when agreement also included the alternate reference label as the OA for NLCD2016 2016 was 86.9% (Wickham et al. Citation2021) compared to 87.1% for NLCD2019 2016. At Level I of the classification hierarchy, OA was 83.1% ± 0.9% when a match included only the primary reference label for the 2016 and 2019 land cover dates in NLCD2019 and improved to 90.3% ± 0.7% when agreement also included the alternate reference label (). Similar to Level II, the Level I OA of 83.1% for NLCD2019 2016 for the primary only definition of agreement was a 4% improvement in Level I OA when compared to NLCD2016 2016 for the same agreement definition.

Table 4. Level II agreement between map and reference labels for NLCD2019 2019 for the conterminous United States. Class codes are in . Cell entries are percentage of map area (7,789,691 km2; see also caption). OA, PA, and UA are overall, producer’s, user’s accuracies (and their standard errors) based on agreement defined as the map label matching the primary label only. OAa, PAa, and UAa, are the corresponding estimates for agreement defined as either the primary or alternate reference labels matching the map label. SE is the standard error of the class areas estimates based on the reference (column) label. Map bias (MB) is row (map) total minus reference (column) total. MB is the error attributable to estimating class area based on the map (i.e.; pixel counting), as discussed in Olofsson et al. (Citation2014). Sum of MB ≠ 0 due to rounding error. Accuracy estimates (e.g. UA, UAa) and their (SE) are rounded to the nearest integer. OA = 77.5% ± 1.0%; OAa = 87.1% ± 0.7%.

Table 5. Level II agreement between map and reference labels for NLCD2019 2016 for the conterminous United States. See for an explanation of contents. Sum of MB ≠ 0 due to rounding error. OA = 77.6% ± 1.0%; OAa = 87.2% ± 0.7%.

Table 6. Level I agreement between map and reference labels for NLCD2019 2019. See for an explanation of contents. Sum of MB ≠ 0 due to rounding error. OA = 83.1% ± 0.9%; OAa = 90.3% ± 0.7%.

Table 7. Level I agreement between map and reference labels for NLCD2019 2016. See for an explanation of contents. Sum of MB ≠ 0 due to rounding error. OA = 83.1% ± 0.9%; OAa = 90.5% ± 0.7%.

Missing Google Earth™ reference imagery appeared to have a counterintuitive effect on NLCD2019 2019 accuracy (). That is, human interpretation of Landsat (reference classification) tended to have higher agreement with the NLCD (machine) classification of Landsat relative to human interpretation of Google Earth™ compared to the NLCD classification of Landsat (i.e. the with Google Earth™ subset). OA was about 6% higher for the subset missing Google Earth™ reference imagery than the subset with Google Earth™ reference imagery regardless of agreement definition. The differences in OA between the subsets were also evident in the UA and PA values. Overall, UA and PA tended to be higher for the subset of sample pixels that did not have 2019 Google Earth™ reference imagery compared to the subset that had 2019 Google Earth™ reference imagery. UA for the medium- and high-density urban classes (Level II = 23 and 24) were exceptions to this tendency. UA for the with Google Earth™ subset had much higher UA values for these two classes than the without Google Earth™ subset. The high PA values for classes 23 and 24 in the without Google Earth™ subset were likely attributable to very low sample counts, which were<10 for both classes.

Table 8. Comparison of UA and PA for the single-date NLCD2019 2019 with (SE) for sample pixels with ca. 2019 Google Earth™ reference imagery and without ca. 2019 Google Earth™ reference imagery. UA and PA are based on match between the map label and either the primary or alternate reference label.

3.2. Binary and theme-specific change accuracy

Accuracy of binary change versus no change was consistent with the previous NLCD2016 accuracy assessment (e.g. Wickham et al. Citation2021). OA was 83% when agreement was defined as a match between the map label and only the primary reference label and improved to 90% when agreement also included the alternate reference label (). OA reflects the dominance and high UA and PA for the no change class, as UA and PA of no change were>95% regardless of agreement definition. UA and PA for the change class were 41% and 42%, respectively, when agreement included only the primary reference labels and improved to 48% and 63%, respectively, when the alternate reference label was also included in the agreement definition.

Table 9. Binary (no change/change) agreement between map and reference labels. See for an explanation of contents. OA = 83.1% ± 0.9%; OAa = 90.5% ± 0.7%.

The effect of missing Google Earth™ reference imagery on binary change was similar to the result for single-date NLCD2019 2019 land cover. Regardless of agreement definition, change UA was nearly identical across the with and without Google Earth™ domains, and hence nearly equivalent the UA values reported in for the two agreement definitions. In contrast, binary change PA was higher for the without Google Earth™ domain than the with Google Earth™ domain. However, SE increased substantially for the without Google Earth™ domain indicating that the differences may be within the range of sampling variability of the PA estimates of the two domains.

Previous NLCD accuracy assessments have reported UA and PA for loss and gain themes for water, urban (gain only), forest, shrub, grass, and agriculture based on agreement between the map label and either the primary or alternate reference labels (Wickham et al. Citation2021, ). Using the same agreement definition, loss and gain UA and PA for each of these themes was about the same for NLCD2019 2016–2019 change compared to NLCD2016 2011–2016 change reported in Wickham et al. (Citation2021), although there were slightly more substantial declines than there were increases in UA and PA. UA and PA for the NLCD2019 2016–2019 no change classes improved slightly compared to their NLCD2016 2011–2016 counterparts. Urban gain UA and PA for NLCD2019 were very low and much less than previous NLCD accuracy assessments. Forest loss UA was consistent with that reported for 2011–2016 but forest loss PA was lower.

Table 10. Comparison of UA and PA for loss, gain, and no change themes for NLCD2019 versus NLCD2016. Non-overlapping standard errors (SE) were attributed as substantial increase (↑) or decrease (↓). Agreement is based on a match between the map label and either the primary or alternate reference label.

Not surprisingly, the low UA and PA across loss and gain themes were dominated by confusion with the theme-specific no change and the theme-specific never (e.g. never urban) categories of the 4-x-4 change error matrices. For example, the dominant source of urban gain omission error was area mapped as never urban (), and similarly the largest source of forest gain omission error was area mapped as never forest (). The primary source of forest gain commission error was area of stable forest that was incorrectly mapped as forest gain (). The never category was almost exclusively the highest off-diagonal cell entry for both commission and omission error for the other change themes (results not shown).

Table 11. Urban (U) change accuracy. Values of (0) for Standard Error (SE) are less than 0.5%.

Table 12. Forest (F) change accuracy. Values of (0) for Standard Error (SE) are less than 0.5%.

3.3. Post-analysis review

Several error patterns were uncovered as a result of conducting the post-analysis review.

  • Urban gain accuracy was impacted by road data error. Disagreement for 13 of the 35 sample pixels examined appeared to be attributable to road data error. Roads that were not constructed between ca. 2016 and ca. 2019 based on examination of temporal Google Earth™ imagery were identified as new road construction between ca. 2016 and ca. 2019 by NLCD. These sample pixels tended to occur in remote areas that had small, often unpaved roads, or no roads (i.e. not visible on Google Earth™ imagery).

  • Forest loss accuracy did not show any strongly discernable patterns in the post-analysis review. In most cases a loss in vegetation biomass was not apparent in the Google Earth™ imagery. At a continental scale, there was a much higher rate of forest loss map-reference agreement in the eastern U.S. than the western U.S.

  • Forest gain accuracy did not show any strongly discernable patterns in the post-analysis review. In most cases, an increase in vegetation biomass was not apparent in the Google Earth™ imagery. Consequently, reference labels for sample pixels mapped as forest gain most often indicated that change did not occur. The geographic pattern of the sample pixels mapped as forest gain, being concentrated in the Pacific Northwest and southeast, appeared to be strongly associated with timber harvest and regeneration activities. There was a much higher likelihood of forest gain map-reference agreement in the southeast than elsewhere.

  • Agriculture loss error revealed no discernable patterns in the post-analysis review. More broadly, reference labels for sample pixels mapped as agriculture loss stratum almost always indicated no change. Reference labels in this stratum that indicated change comprised only 20% of the sample pixels mapped as agriculture loss.

  • Agriculture gain error was similar to agriculture loss. Most sample pixels mapped as agriculture gain had reference labels indicating no change. The main difference in the error patterns between agriculture loss and agriculture gain was the spread across the 8 no-change, Level I classes. For agriculture gain, almost all reference labels indicated static agriculture, whereas the reference labels for agriculture loss indicated no change spread more evenly across urban, barren, agriculture, and wetland.

  • Shrubland-grassland flux error appeared to be associated with aridity. Sample pixels that coincided with low, sparse shrub cover, as depicted in the Google Earth™ imagery, tended to have mismatched map and reference labels. For example, about dozen sample pixels mapped as shrubland-grassland flux were in west Texas and New Mexico. Nearly all had mismatched map and reference labels and the vegetation cover apparent in the Google Earth™ imagery tended to be a sparse coverage of small shrubs punctuating a more widespread coverage of grass. Agreement was much higher in the eastern U. S., especially the southeast, and associated with apparent tree harvesting and regeneration activities. Nearly 70% of the sample pixels of this type of mapped change located east of the Mississippi River had matching map and reference labels.

  • Open urban was confounded by class ambiguity and the presence of remote roads. Upon reexamination of the map and reference labels, it was found that a case could be made for assigning map-matching reference labels (primary or alternate) for 20 of the 50 sample pixels examined. For the sample pixels that had 81 (pasture) or 71 (grassland) as the map label and open urban as the primary reference label, disagreement appeared to be a matter of context. These sample pixels tended to occur in rural areas of the Midwest and Great Plains coincident with agricultural buildings. Ancillary data used to model impervious area include these isolated structures but not the yards and managed vegetation around them. For the sample pixels that had forest (41, 42, or 43) as the map label and open urban as the primary reference label, disagreement appeared to be a lack of clarity on the class of road included in the models used to produce NLCD land cover. Many of these sample pixels were in remote areas, often associated with apparent tree harvesting and regeneration activities that had unpaved roads within or adjacent to the sample pixel.

4. Discussion

Accuracy of NLCD2019 was consistent with previous releases of NLCD (Wickham et al. Citation2021). Level II and Level I single-date accuracies were 87% and 90%, respectively, when the alternate reference label was included in the definition of agreement. Change accuracies were also consistent with previous assessments of NLCD land cover change. Improvements in the precision of change PA estimates may have been realized if the sample size allocation to strata were tailored to mitigate the effect of the area dominance of the no change class (Olofsson et al. Citation2020; Stehman and Foody Citation2019). That is, the standard error of the PA estimate is reduced when larger sample sizes are allocated to the common (larger) strata. However, a primary motivation of the stratified design was to achieve adequately small standard errors for UA of rare classes, which requires increasing the sample size for rare classes. Therefore, precise estimation of PA and UA are competing objectives in terms of how to allocate the sample to strata. The only way to resolve the conflict would have been to increase the total sample size and allocate these additional sample pixels to the common strata. Unfortunately, programmatic constraints could not accommodate adding additional accuracy assessment objectives beyond optimizing for small UA standard errors for rare classes.

The post-analysis review, adopted as an additional accuracy assessment protocol new to NLCD, revealed error patterns that may lead to improvements in land cover production and reference data collection. Identification of road data error in urban gain sample pixels provides information that may be useful to NLCD map producers. Editing of road data is a component of NLCD production (Homer et al. Citation2020). Because of program changes, NLCD lost access to the GIS road dataset used to produce NLCD2011 and NLCD2016. Changes in GIS road data sources may have contributed to the attribution of roads as newly constructed between ca. 2016 and ca. 2019, which likely reduced urban gain accuracy. The error pattern in the shrubland – grassland flux stratum suggests that methods for classifying change between shrubland and grassland could be refined. Temporal NDVI patterns or products from other MRLC members (Wickham et al. Citation2014) could be incorporated into the modeling process, which might lead to improved accuracy of mapped changes from shrubland to grassland and vice versa. Likewise, the error patterns in the agriculture gain and loss change themes also suggest additional post processing could improve the accuracy of these strata. Post processing could include a direct comparison with map labels from the annually produced Cropland Data Layer (CDL) (Boryan et al. Citation2011). CDL reports high thematic accuracy for major crop types such as corn, soybeans, and cotton that comprise the majority of area devoted to crops (Boryan et al. Citation2011; Lark, Schelly, and Gibbs Citation2021).

The post-analysis review also highlighted areas where NLCD reference label protocols could be improved. Particularly, uncertainty regarding whether the presence of a road should factor into the reference classification (e.g. Level II class = 21) appears to have contributed to mismatched map and reference labels. Unpaved roads are also a frequent source of uncertainty. The response design protocols (Stehman and Czaplewski Citation1998) developed and used by NLCD (Wickham et al. Citation2021, Supplemental Information) did not include guidelines for when and how roads should factor into the reference label assignment. These guidelines could be updated to include more specific definitions for inclusion of roads in the developed (urban) classes.

We were surprised by the counterintuitive impact of reference imagery availability on accuracy results. We would have expected lack of Google Earth™ imagery to have a dampening effect on accuracy. Reliance on reference data that are of higher quality than the data used to derive the map classification is a fundamental principle of accuracy assessment (Olofsson et al. Citation2014; Stehman and Foody Citation2019). Higher quality is most commonly interpreted as higher spatial resolution but guidance for using Landsat images as reference data has also been articulated (Olofsson et al. Citation2014; Stehman and Foody Citation2019). We speculate that experience may have been an influential factor contributing to the counterintuitive results. The reference classification was done by the NLCD project’s most experienced interpreter, who has contributed to or led reference data collection for all previous NLCD accuracy assessments. Experience is recognized as an important factor in the collection of accurate reference classification data (Bianchetti and MacEachren Citation2015; Van Coillie et al. Citation2014). However, we are not aware of any studies examining interpreter experience as a factor mitigating the potential negative influence of poor quality or missing reference imagery. NLCD reference data has, as a standard of practice, included a “notes” field, which is used to record the context of sample pixels and any interpretation challenges (e.g., uncertainty in reference label assignment). We searched the “notes” field of the 1431 sample pixels that had missing 2016 or 2019 Google Earth™ imagery for phrases indicating missing Google Earth™ imagery complicated reference label assignment (e.g. “hard to tell without 2019 Google Earth™ imagery”). Only 41 of the 1431 (3%) sample pixels with missing 2016 or 2019 Google Earth™ imagery included such phrases, suggesting the missing Google Earth™ imagery made reference label assignment a challenge only rarely. More precisely, our speculation is that with sufficient experience in the tandem comparison of high-resolution and Landsat imagery to collect reference classification data, the hinderance imposed by the absence of the high-resolution component of the tandem becomes less challenging because of the wealth of information in the mental database accrued through all previous efforts.

5. Summary and conclusion

Overall, NLCD2019 land cover thematic accuracy results are consistent with assessment of previous NLCD land cover accuracy assessments. Single-date level II and Level I overall accuracies for NLCD2019 2016 improved measurably compared to their counterparts reported for NLCD2016 2016 (Wickham et al. Citation2021). Differences in UA and PA for loss and gain change themes between NLCD2019 2016–2019 and NLCD2016 2011–2016 were equivocal. Accuracy of NLCD2019 binary change for 2016–2019 was nearly identical to NLCD2016 binary change accuracy for 2011–2016 (Wickham et al. Citation2021; ), and there was a mix of substantial UA and PA increases and decreases across the loss and gain change themes when compared to the previous NLCD land cover accuracy assessment (). The history of NLCD accuracy assessment, to which this article contributes, indicates that NLCD produces highly accurate single-date land cover maps at Level II and Level I, but similarly accurate mapping of land cover change continues to be elusive. Only forest loss user’s accuracy has been consistently≥70% across all NLCD land cover change accuracy assessments (Wickham et al. Citation2013, Citation2017, Citation2021). Protocols such as the post-analysis review of map errors may contribute to the ongoing effort to improve NLCD change accuracy.

Acknowledgments

This document has been reviewed by the U.S. Environmental Protection Agency (EPA), Office of Research and Development , and approved for publication. The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. This paper has been peer reviewed and approved for publication consistent with U.S. Geological Survey (USGS) Fundamental Science Practices (https://pubs.usgs.gov/circ/1367/). We also wish to thank Jeremy Baynes (EPA) and anonymous reviewers for their thoughtful comments on earlier versions of the paper. Funding support for Steve Stehman was provided via contract G17AC00237 between the State University of New York – Environmental Sciences and Forestry (SUNY-ESF) and USGS. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data are available on the U.S. EPA environmental dataset gateway at https://edg.epa.gov/EPADataCommons/Public/ORD/EnviroAtlas/NLCD2019_AA_RefData.zip. The zipped file includes the requisite ESRI Arc shapefiles and three kmz files that can be used for display in Google Earth™. The three kmz files are the reference point (pixel center) locations, reference pixel boundary, and the boundaries of the 3 x 3-pixel neighborhood surrounding the reference pixel. The kmz point file includes reference and map labels and associated attributes. The shapefiles are in an Albers conic equal area projection. Projection parameters and attribute descriptions are in the associated xml files.

References

  • Bianchetti, R. A., and A. M. MacEachren. 2015. “Cognitive Themes Emerging from Air Photo Interpretation Texts Published to 1960.” ISPRS International Journal of Geo-Information 4 (2): 551–16. doi:10.3390/ijgi4020551.
  • Bianchetti, R. A., and A. M. MacEachren. 2015. “Cognitivie Themes Emergin from Air Photo Interpretation Texts Published to 1960.” ISPRS International Journal of Geo-Information 4: 551–571. doi:10.3390/ijgi.4020551.
  • Boryan, C., Z. Yang, R. Mueller, and M. Craig. 2011. “Monitoring US Agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program.” Geocarto International 26 (5): 341–358. doi:10.1080/10106049.2011.562309.
  • Brown, J. F., H. J. Tollerud, C. P. Barber, Q. Zhou, J. L. Dwyer, J. E. Vogelmann, T. R. Loveland, et al. 2020. “Lessons Learned Implementing an Operational Continuous United States National Land Change Monitoring Capability: The Land Change Monitoring, Assessment, and Projection (LCMAP) Approach.” Remote Sensing of Environment 238: 111356. doi:10.1016/j.rse.2019.111356.
  • Cochran, W. G. 1977. Sampling Techniques. 3rd ed. New York: John Wiley & Sons.
  • EPA. 2022. “Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2020.” U.S. Environmental Protection Agency, EPA 430-R-22-003. https://www.epa.gov/system/files/documents/2022-04/us-ghg-inventory-2022-main-text.pdf
  • FGDC (Federal Geographic Data Committee). 2021. “2020 Lead Cover Agency, NGDA Theme Annual Performance Report for Land Use – Land Cover Theme Geospatial Data Act of 2018, Section 756 Requirements.” https://www.fgdc.gov/gda/gda-lca-theme-reports/fy2020-land-use-land-cover-theme-gda-annual-report.pdf
  • Flood, N. (2013). Seasonal Composite Landsat TM/ETM+ Images Using the Medoid (A Multi-Dimensional Median). Remote Sensing, 5(12), 6481–6500. 10.3390/rs5126481
  • Google Earth. n.d. “How Images are Collected.” Accessed January 4 2023. https://support.google.com/earth/answer/6327779?hl=en
  • Gopal, S., and C. Woodcock. 1994. “Theory and Methods for Accuracy Assessment of Thematic Maps Using Fuzzy Sets.” Photogrammetric Engineering & Remote Sensing 60: 181–188.
  • Hansen, M. C., and T. R. Loveland. 2012. “A Review of Large Area Monitoring of Land Cover Change Using Landsat Data.” Remote Sensing of Environment 122: 66–74. doi:10.1016/j.rse.2011.08.024.
  • Holben, B. N. 1986. “Characteristics of Maximum-Value Composite Images from Temporal AVHRR Data.” International Journal of Remote Sensing 7 (1): 417–1434. doi:10.1080/01431168608948945.
  • Homer, C., J. Dewitz, S. Jin, G. Xian, C. Costello, P. Danielson, L. Gass, et al. 2020. “Conterminous United States Land Cover Change Patterns 2001 – 2016 from the 2016 National Land Cover Database.” ISPRS Journal of Photogrammetry and Remote Sensing 162: 184–199. doi:10.1016/j.isprsjprs.2020.02.019.
  • Jin, S., J. Dewitz, P. Danielson, B. Granneman, C. Costello, K. Smith, and Z. Zhu. in press. “National Land Cover Database 2019: A New Strategy for Creating Clean Leaf-On and Leaf-Off Image Composites.” Journal of Remote Sensing. doi:10.34133/remotesensing.0022.
  • Jin, S., C. Homer, L. Yang, P. Danielson, J. Dewitz, C. Li, Z. Zhu, G. Xian, and D. Howard. 2019. “Overall Methodology Design for the United States National Land Cover Database 2016 Products.” Remote Sensing 11 (24): 297. doi:10.3390/rs11242971.
  • Lark, T. J., I. H. Schelly, and H. K. Gibbs. 2021. “Accuracy, Bias, and Improvements in Mapping Crop and Cropland Across the United States Using the USDA Cropland Data Layer.” Remote Sensing 13 (5): 968. doi:10.3390/rs13050968.
  • Loveland, T. R., and D. M. Shaw (1996) Multiresolution Land Characterization: Building Collaborative Partnerships. In. GAP Analysis: A Landscape Approach to Biodiversity Planning (edited by Scott JM, Tear T, and Davis F), Proceedings of the ASPRS/GAP Symposium, Charlotte, NC (National Biological Service, Charlotte, pp. 83–89.
  • Olofsson, P., P. Arévalo, A. B. Espejo, C. Green, E. Lindquist, R. E. McRoberts, and M. J. Sanz. 2020. “Mitigating the Effects of Omission Errors on Area and Area Change Estimates.” Remote Sensing of Environment 236: 111492. doi:10.1016/j.rse.2019.111492.
  • Olofsson, P., G. M. Foody, M. Herold, S. V. Stehman, C. E. Woodcock, and M. A. Wulder. 2014. “Good Practices for Estimating Area and Assessing Accuracy of Land Change.” Remote Sensing of Environment 148: 42–57. doi:10.1016/j.rse.2014.02.015.
  • Pengra, B. W., S. V. Stehman, J. A. Horton, D. J. Dockter, T. A. Schroeder, Z. Yang, W. B. Cohen, S. P. Healy, and T. R. Loveland. 2020. “Quality Control and Interpreter Consistency of Annual Land Cover Reference Data in an Operation National Monitoring Program.” Remote Sensing of Environment 238: 111261. doi:10.1016/j.rse.2019.111261.
  • Qui, S., S. Zhu, P. Olofsson, C. E. Woodcock, and S. Jin. 2023. “Evaluation of Landsat Image Compositing Algorithms.” Remote Sensing of Environment. doi:10.1016/j.rse.2022.113375.
  • Roy, D. P., J. Ju, K. Kline, P. L. Scaramuzza, V. Kovalskyy, M. Hansen, T. R. Loveland, E. Vermote, and C. Zhang. 2010. “Web-Enabled Landsat Data (WELD): Landsat ETM+ Composited Mosaics of the Conterminous United States.” Remote Sensing of Environment 114 (1): 35–49. doi:10.1016/j.rse.2009.08.011.
  • Särndal, C. E., B. Swensson, and J. Wretman. 1992. Model-Assisted Survey Sampling. New York: Springer-Verlag.
  • SAS Institute Inc. 2017. SAS/STAT® 14.3 User’s Guide. Cary, NC.
  • Stehman, S. V. 2001. “Statistical Rigor and Practical Utility in Thematic Map Accuracy Assessment.” Photogrammetric Engineering & Remote Sensing 67: 727–734.
  • Stehman, S. V. (2014). Estimating Area and Map Accuracy for Stratified Random Sampling When the Strata are Different from the Map Classes. International Journal of Remote Sensing, 35(13), 4923–4939. 10.1080/01431161.2014.930207
  • Stehman, S. V., and R. L. Czaplewski. 1998. “Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles.” Remote Sensing of Environment 64 (3): 331–344. doi:10.1016/S0034-4257(98)00010-8.
  • Stehman, S. V., and G. M. Foody. 2019. “Key Issues in Rigorous Accuracy Assessment of Land Cover Products.” Remote Sensing of Environment 231: 111199. doi:10.1016/j.rse.2019.05.018.
  • Stehman, S. V., and J. D. Wickham. 2011. “Pixels, Blocks of Pixels, and Polygons: Choosing a Spatial Unit for Thematic Accuracy Assessment.” Remote Sensing of Environment 115 (12): 3044–3055. doi:10.1016/j.rse.2011.06.007.
  • Stehman, S. V., J. D. Wickham, J. H. Smith, and L. Yang. 2003. “Thematic Accuracy of the 1992 National Land-Cover Data (NLCD) for the Eastern United States: Statistical Methodology and Regional Results.” Remote Sensing of Environment 86 (4): 500–516. doi:10.1016/S0034-4257(03)00128-7.
  • USGS. 2021. “Landsat 4-7 Collection 2 (C2) Level 2 Science Product (L2SP) Guide. Version 4.” U.S. Geological Survey, LSDS-1618, September. https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/media/files/LSDS-1618_Landsat-4-7_C2-L2-ScienceProductGuide-v4.pdf
  • USGS. 2022. “Landsat 8-9 Collection 2 (C2) Level 2 Science Product (L2SP) Guide. Version 4.” U.S. Geological Survey, LSDS-1619, March. https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/media/files/LSDS-1619_Landsat-8-9-C2-L2-ScienceProductGuide-v4.pdf
  • USGS. n.d. “National Land Cover Database Class Legend and Description.” U.S. Geological Survey. Accessed January 4 2023. https://www.mrlc.gov/data/legends/national-land-cover-database-class-legend-and-description
  • Van Coillie, F. M. B., S. Gardin, F. Anseel, W. Duyck, L. P. C. Verbeke, and R. R. De Wulf. 2014. “Variability of Operator Performance in Remote Sensing Image Interpretation: The Importance of Human and External Factors.” International Journal of Remote Sensing 35 (2): 754–778. doi:10.1080/01431161.2013.873152.
  • Wickham, J., C. Homer, J. Vogelmann, A. McKerrow, R. Mueller, N. Herold, and J. Coulston. 2014. “The Multi-Resolution Land Characteristics (MRLC) Consortium — 20 Years of Development and Integration of USA National Land Cover Data.” Remote Sensing 6: 7424–7441. doi:10.3390/rs6087424.
  • Wickham, J. D., S. V. Stehman, J. A. Fry, J. H. Smith, and C. G. Homer. 2010. “Thematic Accuracy of the NLCD 2001 Land Cover for the Conterminous United States.” Remote Sensing of Environment 114 (6): 1286–1296. doi:10.1016/j.rse.2010.01.018.
  • Wickham, J., S. V. Stehman, L. Gass, J. Dewitz, J. A. Fry, and T. G. Wade. 2013. “Accuracy Assessment of NLCD 2006 Land Cover and Impervious Surface.” Remote Sensing of Environment 130: 294–304. doi:10.1016/j.rse.2012.12.001.
  • Wickham, J., S. V. Stehman, L. Gass, J. A. Dewitz, D. G. Sorenson, B. J. Granneman, R. V. Poss, and L. A. Baer. 2017. “The Accuracy Assessment of the 2011 National Land Cover Database (NLCD).” Remote Sensing of Environment 191: 328–341. doi:10.1016/j.rse.2016.12.026.
  • Wickham, J., S. V. Stehman, D. G. Sorenson, L. Gass, and J. A. Dewitz. 2021. “Thematic Accuracy of the NLCD 2016 Land Cover for the Conterminous United States.” Remote Sensing of Environment 257: 112357. doi:10.1016/j.rse.2021.112357.
  • Wulder, M. A., T. R. Loveland, D. P. Roy, C. J. Crawford, J. G. Masek, C. E. Woodcock, R. G. Allen, et al. 2019. “Current Status of Landsat Program, Science, and Applications.” Remote Sensing of Environment 225: 127–147. doi:10.1016/j.rse.2019.02.015.
  • Yang, L., S. Jin, P. Danielson, C. Homer, L. Gass, S. M. Bender, A. Case, et al. 2018. “A New Generation of the United States National Land Cover Database: Requirements, Research Priorities, Design, and Implementation Strategies.” ISPRS Journal of Photogrammetry and Remote Sensing 146: 108–123. doi:10.1016/j.isprsjprs.2018.09.006.