Full article: Generalized models for subtropical forest inventory attribute estimations using a rule-based exhaustive combination approach with airborne LiDAR-derived metrics

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Airborne LiDAR has been widely used to map forest inventory attributes at various scales. However, most of the developed models on airborne LiDAR-based forest attribute estimations are specific to a study site and forest type (or species), so it is essential to develop predictive models with excellent generalization capabilities across study sites and forest types for the consistent estimation of forest attributes. In this study, 13 LiDAR-derived metrics, which depicted the three-dimensional structural aspects of stand canopy and had clear forest mensuration and ecology significance, were categorized into three groups (height, density, and vertical structure). A rule-based exhaustive combination was then used to construct 86 multiplicative power formulations consisting of 2–5 predictors for estimating the stand volume and basal area. By calibrating and validating these formulations using data from four forest types in the three study regions, we obtained the 24 best local models. Based on these models we proposed a set of accuracy criteria to determine generalized formulations and models. By applying two selection methods (the mean and mixed data methods), we finally archived the eight best region-generalized models, which could be used for estimating the stand volume and basal area of four forest types across study sites on a province scale. This study highlights the accuracy criteria and procedures for developing generalized formulations and models for consistent estimations of forest inventory attributes using airborne LiDAR data.

KEYWORDS:

1. Introduction

Accurate, detailed, and reliable information on forest resources at various scales not only helps forest managers optimize forest management activities; but also helps government departments make strategic decisions. Airborne laser scanning (ALS; also referred to as light detection and ranging (airborne LiDAR)) has the ability to range and penetrate tree canopies accurately (Watt and Watt Citation2013) and capture precise three-dimensional (3D) structural information of the canopy (Coops et al. Citation2021). Since it became commercially applicable in the mid-1990s (Vauhkonen et al. Citation2014), airborne LiDAR has gradually developed as an advanced technological tool for forest inventory. In Scandinavian countries and Canada, airborne LiDAR has been applied to operational forest resource inventories since 2002 (Næsset et al. Citation2004; White et al. Citation2017). In some other countries and regions, airborne LiDAR has been applied in large-scale ecological monitoring and planted forest inventories (Watt and Watt Citation2013; Görgens et al. Citation2015; Silva et al. Citation2017; Matasci et al. Citation2018).

Regional airborne LiDAR forest attribute estimation was performed using an area-based approach (ABA) based on the statistical relationship between the predictor variables derived from airborne LiDAR data and the response variables measured from field plots (Næsset et al. Citation2004). In the past 20 years, numerous studies have been conducted on various forest types and forest attributes (mean stand height, H; mean stand diameter at breast height, DBH; basal area, BA; stand volume, VOL; aboveground biomass, AGB; etc.) and have produced numerous estimation models (Zolkos, Goetz, and Dubayah Citation2013; Latifi et al. Citation2015), including parametric regression and nonparametric models. Most of these studies have been devoted to finding the best prediction model, i.e. maximizing explained variability (e.g. R²), minimizing prediction error (e.g. RMSE), and minimizing systematic bias (Zolkos, Goetz, and Dubayah Citation2013) for specific forest attribute, forest type, and study site (Næsset et al. Citation2005; Hudak et al. Citation2008; Penner, Pitt, and Woods Citation2013; White et al. Citation2017). Parametric models or multiple linear regression (MLR) are the main tools for airborne LiDAR forest parameter estimation (da Silva et al. Citation2020; Novo-Fernández et al. Citation2019). Their main advantages are the simplicity and clarity of the resulting models, but they need to satisfy the assumptions of normality, homoscedasticity, independence, and linearity (García-Gutiérrez et al. Citation2015). Non-parametric models or machine learning models are data-driven models with the ability to capture the implied, potentially nonlinear, and complex relationships between the dependent and independent variables (García-Gutiérrez et al. Citation2014; Gajardo, García, and Riaño Citation2014) and do not depend on any priori assumptions about the data (Millie et al. Citation2012). The main problems with non-parametric models are their higher intrinsic dependence on training data (usually field measurement data) (Zaffalon Citation2005), the high risk of overfitting (Hamedianfar et al. Citation2022; Hawkins Citation2004), and lack of transferability (Zhao et al. Citation2017; García-Gutiérrez et al. Citation2014). Many studies have demonstrated that the estimation accuracies of machine learning models are slightly better than that of multiple linear regression models (van Ewijk et al. Citation2019; García-Gutiérrez et al. Citation2015). However, some studies have also demonstrated the superiority of linear regression models over machine learning models (Xu, Manley, and Morgenroth Citation2018; Gao et al. Citation2018; Watt et al. Citation2015; da Silva et al. Citation2020). This study is dedicated to the development of transferability models among study sites and forest types, so we focused on the parametric models. Given the flexibility of multiplicative models (Hollaus et al. Citation2009), the multivariate multiplicative power models were used to build generalization models in this study.

Multiple linear (generally requires a logarithmic transformation, it is a nonlinear model) and nonlinear regressions are the most commonly used models for estimating forest attributes (Görgens et al. Citation2015; Giannico et al. Citation2016; Montealegre et al. Citation2016; Silva et al. Citation2017). There are various approaches to selecting model variables, such as stepwise regression techniques (Means et al. Citation2000; Treitz et al. Citation2012; Ene et al. Citation2012; Montealegre et al. Citation2016), the random forest approach (Silva et al. Citation2017; Luo et al. Citation2018; van Ewijk et al. Citation2019), the Akaike information criterion (AIC) (Silva et al. Citation2017; Sheridan et al. Citation2015), the Pearson correlation coefficient, and empirical analysis (Zhang, Cao, and She Citation2017; Cao et al. Citation2019), all of which have advantages and limitations. No matter which method is used, the selected model variables varied greatly across study areas, forest types, and stand conditions (Xu, Manley, and Morgenroth Citation2018; Görgens et al. Citation2015; Giannico et al. Citation2016; Montealegre et al. Citation2016; Silva et al. Citation2017; Sheridan et al. Citation2015; Treitz et al. Citation2012; Ene et al. Citation2012; Zhang, Cao, and She Citation2017; Kim et al. Citation2016). Thus, most models are not generalizable and transferable. During the pre-experiment of this study, we performed ten tests using a rule-based combination of 13 LiDAR variables, each with 90% of the plots randomly selected from the same dataset for model calibration. The final results obtained six different model formulations. This experiment indicated that, even with minor changes in the data, the model variables were different and the model structures were unstable. This situation is common to both MLR and machine learning; even though the model, forest type, and forest parameters are the same, the model variables are different in different study areas. (Singh et al. Citation2015; Gleason and Im Citation2012; Liu et al. Citation2021). Therefore, constructing formulations and models applicable to various regions, forest types, and stand structures—i.e. generalized or transferable models – the models built with data from one region, one forest type, or one year are transferable to other study areas, other forest types, or another year – is a current challenge (Knapp et al. Citation2020; Bouvier et al. Citation2015; Fekety, Falkowski, and Hudak Citation2015). Another major problem with existing LiDAR-based models for estimating forest attributes is that some LiDAR-derived variables, such as the (all_return_above_3 m)/(total_1st_return)×100 (Xu, Manley, and Morgenroth Citation2018), 95% density percentile (Montealegre et al. Citation2016), and cubic mean height (Görgens et al. Citation2015), have no clear forest mensuration or ecological significance. This problem is also present in machine learning models (Silva et al. Citation2017; Pearse et al. Citation2017).

Based on the relationship between AGB and DBH in field plots, Asner et al. (Citation2012) proposed a generalized model for estimating forest aboveground carbon density (ACD). This model was developed using data from four tropical forest study areas and has been applied successfully to different tropical regions (Asner et al. Citation2012; Coomes et al. Citation2017; Getzin et al. Citation2017). However, this model had only one LiDAR variable among its three variables; therefore, it was not a genuine LiDAR-based model. Knapp et al. (Citation2020) categorized LiDAR-derived metrics depicting forest structure into four groups: height, density, maximum height, and canopy heterogeneity variables, and obtained optimal prediction models for AGB and BA containing 4–5 variables selected by different combinations of variables. Similar to the model of Asner et al. (Citation2012), non-LiDAR variables, such as maximum stand density in the study area obtained from field plot measurements or expert knowledge, were used in the model of Knapp et al. (Citation2020). However, the idea of model construction was scientific and imaginative. Bouvier et al. (Citation2015) proposed an empirical model with the same model shape and the same four LiDAR metrics depicting the 3D structure of the stand canopy (mean point cloud height and its standard deviation, canopy cover, and coefficient of variation of leaf area density) for forest inventory attribute estimation. The model achieved satisfactory results in estimating different forest attributes for different forest types in France. Thus, it was a generalized model for forest attribute estimation constructed entirely by LiDAR metrics. Yet, questions remain on how to evaluate the generalization performance of a model, how to develop a generalized model, and how to construct a generalized model with high accuracy while following the principles of forest mensuration and ecology. All these issues require additional investigation.

The overall goal of this study was to investigate the procedures for developing generalized predictive models for forest inventory attributes using airborne LiDAR data that can be applied across study sites and forest types. To fulfill this goal, airborne LiDAR data and field plot data were collected from a large subtropical region. Three specific objectives were pursued: (1) to establish the accuracy criteria for determining the generalized models; (2) to develop the best region-generalized formulations and models for estimating the stand volume and basal area across study sites without causing prediction bias at any individual site; and (3) to develop the best type-generalized formulations and models for estimating the stand volume and basal area across study sites and forest types without causing prediction bias at any individual site and forest type.

2. Materials and Methods

2.1. Study area

The study area were the same as the another study of Li et al. (Citation2022), it covered the entire Guangxi Zhuang Autonomous Region (104°28′-112°04′E, 20°54′-26°24′N) in southern China, with an area of 237.6 × 10³ km². The forest area covers 117.21 × 10³ km², which represents 49.3% of the total area of the study area. In this study, airborne LiDAR data acquisition and field plot measurements were carried out over three years in three regions based on financial allocations, namely, the Nanning region (with an area of 22.1 × 10³ km²), the eastern region (128.4 × 10³ km²), and the western region (87.1 × 10³ km²) ().

Figure 1. Location of the study area. (a) Geographic location of study area in China; (b) distribution of field plots in three regions in study area; (c) locations of field plots.

The Tropic of Cancer traverses the central part of the study area, which is in a subtropical monsoon climate zone. The mean annual temperature is 17.6–23.8°C, and the mean annual rainfall is 1080–2760 mm, with 70%–85% of the rainfall concentrated from April to September.

Due to the extensive extent of the study area, the climate, topography, and forest composition of the three regions are different. The elevations in the Nanning region range from 50 to 1760 m above sea level. It has a mixture of mountains, hills, basins, plains, and karst landforms, with forests dominated by eucalyptus plantations (mainly Eucalyptus urophylla and E. grandis × E. urohpylla), natural pine forests (mainly Pinus massoniana), and natural broad-leaved forests, with large karst rocky mountain shrub forests and small planted Chinese fir (Cunninghamia lanceolata) forests. The eastern region has an elevation ranging from 5 m to 2145 m. The area is interspersed with mountains, hills, and plains. The north-central part is dominated by natural broad-leaved forests (typical evergreen broad-leaved forests), pine forests (approximately 80% are natural forests and 20% are planted forests), and planted fir forests, with some eucalyptus plantations. In the central, southern, and southwestern parts of the region, eucalyptus plantations and natural pine forests are dominant, with a few planted fir forests. The elevations in the western region range from 60 m to 2062 m. The northern and northwestern parts of the area are mountainous landforms with natural broad-leaved forests (mainly Quercus variabilis, Q. acutissima, Q. fabri, and Q. aliena), natural pine forests, and planted fir forests, with small eucalyptus plantations. The eastern and south-central parts of the region have a karst landscape with natural broad-leaved and shrub forests in the karst mountains, along with some natural pine forests and eucalyptus plantations. The southern part of the region has a mountainous landscape with natural pine forests and eucalyptus plantations, some natural broad-leaved forests, and a few planted fir forests. According to the 5^th Forest Management Inventory of Guangxi (GX-FMI, 2017–2020), the areas of the fir, pine, eucalyptus, and broad-leaved forests accounted for 16.5%, 17.5%, 24.8%, and 41.2% of the total forest area, respectively. Approximately 80% of the fir forests are pure forests, and the remaining 20% are mixed forests. Approximately 70% of the pine forests are pure forests, and 30% are mixed forests. Almost all the eucalyptus forests are pure forests, and almost all the broad-leaved forests are natural mixed forests.

2.2. Data collection

2.2.1. Field plot data

The forests in the study area were categorized into four types: fir, pine, eucalyptus, and broad-leaved. The cluster-distributed field plots were installed across the study area using the 2015 forest resource database and the ArcGIS 10.0 software based on the representativeness of the forest attributes, such as stand age, mean height, and stand volume. The smallest spatial interval between the plots was 500 m (200 m in the Nanning region). There were 107 clusters of plots, each with 5–30 field plots (). A hand-held dual-frequency differential GNSS receiver was utilized to navigate to the stand at the plot location. The rectangular plot was 30 × 20 m in size, set in a north-south orientation, and was subdivided into four subplots with an area of 15 × 10 m. The boundaries of the plot and subplots were measured and set using a compass and a handheld laser rangefinder and marked by nylon ropes. The northwest and southwest corners of the plot were positioned using a Trimble Global Navigation Satellite System (GNSS) receiver (Trimble Navigation Ltd., Sunnyvale, CA, USA) based on a real-time kinematic (RTK) positioning method. Two RTK-GNSS instruments were used as the base stations, which were located in a nearby open area. Using the post-correction approach, the positioning accuracy was better than 1 m. Within each subplot, we measured the DBH of all live trees with a DBH (1.3 m) greater than or equal to 5 cm using a diameter tape and recorded the tree species. Tree height was measured for three average-height trees and the tallest tree in each subplot using a Haglöf Vertex IV hypsometer. Subplot-level forest attributes included DBH, H, BA, VOL, and stem density. The VOL was calculated using the provincial species-specific allometric equations (Liao and Huang Citation1986), which were based on BA and H as predictors. The following methods were employed to calculate the plot-level forest attributes: the BA and VOL were summed by the subplot-level BA and VOL, respectively, the DBH was calculated for all trees in the four subplots; and the H was calculated using the BA-weighted-average method. Moreover, the dominant species, forest origin, average age, and canopy coverage were measured and recorded at the plot level.

The field plots in the Nanning, eastern, and western regions were measured from October 2016 to January 20 November 201718 to May 2019, and August 2019 to January 2020, respectively. There was a long time interval between the measurement of some plots and the LiDAR data acquisition in the eastern region (over 70% of plots had an interval of over 90 d). In order to characterize the forest structure states during the laser-scanning process, a set of stand growth models was established based on the reviewed field plot data from the Continuous Forest Inventory of Guangxi (CFI-GX) from 2005 to 2015, and the DBH, H, BA, and VOL of the field plots were adjusted via these stand growth models and tree growth rhythm tables (Huang et al. Citation1993, Citation1996). A summary of the field data based on forest type is presented in . For the same forest type, there were differences in the measured forest attributes of the plots in three regions, as shown by the great differences in both the mean and coefficient of variation.

Table 1. Summary statistics of the field plot data. CV is the coefficient of variation.

Download CSV Display Table

2.2.2. LiDAR data

The LiDAR data were collected in three separate acquisitions in the Nanning, eastern, and western regions between October 2016 and April 20 October 201718 and October 2019, and August 2019 and January 2020, respectively. The Riegl VQ-1560 and Riegl VQ-1560i laser-scanning systems (Riegl, Austria) were applied with the same standards to collect LiDAR data in all three regions. The LiDAR survey flight and sensor parameters are shown in . The point clouds were georeferenced to a projection system of the China Geodetic Coordinate System 2000 (CGCS 2000). The mean square error of the laser point cloud height was smaller than 0.15 m.

Table 2. Summary of LiDAR survey flight and sensor parameters.

Download CSV Display Table

Before being delivered to the end users, the raw data was preprocessed by the LiDAR data provider. The TerraScan software (TerraSolid Ltd., Helsinki, Finland) was used to label the point clouds as ground or non-ground returns using the adaptive triangulation network (TIN) filter algorithm. Utilizing ground returns, a digital terrain model (DTM) with a grid cell size of 2 m was finally generated. We removed the influence of topography and obtained DTM-normalized LiDAR point clouds based on the DTM. For the LiDAR height metrics and vertical structure metric, a 2.0-m and a 0.0-m height cutoff were employed in the computation of all features, respectively. All point clouds were used for density metric extraction. To extract LiDAR-derived variables, we used the package version 3.8 of Python (Python Software Foundation)

Height metrics are the significant variables in estimation models of forest attributes (Bouvier et al. Citation2015). The mean height of the point cloud height distribution (Hmean) is one of the most important variables (Asner et al. Citation2012; Lefsky, Cohen, and Spies Citation2001). Hmax, obtained from point clouds or the canopy height model (CHM), is an essential variable characterizing canopy height distribution and can be used to estimate forest attributes (Knapp et al. Citation2020). However, some studies suggested that the variation in Hmax extracted from laser point clouds was extremely unstable and was unsuitable for forest attribute estimation (Gobakken and Næsset Citation2008). Therefore, in this study, the 95^th height percentile (hp95) was used instead of Hmax. Additionally, the standard deviation (Hstdev), and coefficient of variation (Hcv) of the point cloud height distribution, characterized the heterogeneity of canopy height distribution, were also critical variables (Bouvier et al. Citation2015; Silva et al. Citation2017). Stand density depicts the horizontal distribution of trees within a field plot and is an essential variable for describing stand structure. Among various LiDAR-derived density variables, canopy cover (CC), which characterizes the cover of tree stems, branches, and leaves, effectively correlates the LiDAR point cloud with the horizontal distribution of trees and is one of the most commonly used LiDAR density metrics. The 50^th and 75^th canopy density percentiles (dp50 and dp75), which characterize foliage distribution in the middle and upper parts of the canopy, are also commonly used as density metrics (Næsset Citation2014). As subdominant trees and the understory exist under the canopy, LiDAR height and density metrics alone are not sufficient for accurately characterizing the complex canopy structure. In contrast, the leaf area density (LAD) profiles and vertical foliage profiles (VFPs) contain information on the vertical structure of the stand, providing information on canopy vertical heterogeneity and understory vegetation (Hopkinson et al. Citation2013; Lovell et al. Citation2003). In this paper, we extracted the canopy LAD-related metrics, including the mean (LADmean), standard deviation (LADstdev), and coefficient of variation (LADcv), following the methods described by Bouvier et al. (Citation2015). In addition, we calculated the VFP-related variables, including the mean (VFPmean), standard deviation (VFPstdev), and coefficient of variation (VFPcv) following the methods described by Knapp et al. (Citation2020).

2.3. Methods for developing the generalized model

TIn this study, a region-generalized model is defined as one that estimates a specific forest attribute of a forest type across the study sites with an accuracy that meets the given criteria. Similarly, a type-generalized model is defined as one that estimates a specific forest attribute across the forest types and study sites with an accuracy that meets the given criteria. The generalized models in this study involve four classes of models: region-generalized formulation, region-generalized model, type-generalized formulation, and type-generalized model. The difference between the generalized formulation and the generalized model is that the regression parameters of the latter have been determined, while those of the former have not.

The development of the generalized model involves a series of methods and procedures, as shown in . This framework comprises the following steps: (1) the construction, calibration, and validation of the model. Thirteen LiDAR variables were used to construct model formulations using a rule-based exhaustive combination method, and 86 formulations were archived for each forest attribute; (2) after calibrating and validating all 86 models using datasets from each region, the best local model with the smallest rRMSE and the largest R² for a forest inventory attribute of a region and forest type was selected; (3) establishing the accuracy criteria for the generalized models based on the rRMSE and R² of the best local models; (4) using two methods to select the candidates for the best region- or type-generalized formulations; (5) calculating the differences of the rRMSE and R² between the candidates for the best region- or type-generalized formulations and the best local models, and then determining the best region- or type-generalized formulation; (6) using mixed data to calibrate and validate the best generalized formulation, and then determining the best generalized model based on the accuracy criteria of the generalized models. The detailed methods for each step will be described in the following sections.

Figure 2. A flowchart depicting a brief description of the procedures of determining the best region- or type-generalized model for a specific attribute. Note: a) While determining the region-generalized formulation, there are 86 models for each region; while determining the type-generalized formulation, there are four forest types in each region, so there are 4×86 models. b) While determining the region-generalized formulation, there is one best local model for each region; while determining the type-generalized formulation, there are four best local models for each region. c) for the mean method, while determining the region-generalized formulation, average the rRMSE and R² of the models in all regions (three models); while determining the type-generalized formulation, average the rRMSE and R² of the models in all regions and all forest types (12 models). d) for the mean method, while determining the region-generalized formulation, calculate the differences in the rRMSE and R² between the candidates for the best region-generalized models and the three best local models in three regions; while determining the type-generalized formulation, calculate the differences in the rRMSE and R² between the candidates for the best type-generalized models and the 12 best local models in three regions and four forest types. e) While determining the best region-generalized formulation, the mixed data consist of data from three regions of the same forest type; while determining the best type-generalized formulation, the mixed data consist of data from four forest types in three regions.

2.3.1. Model construction

In regional forest resource inventories and monitoring, the VOL is calculated using the following allometric equation:

(1)

V O L = a_{0} B A^{a_{1}} {\overset{ˉ}{H}}^{a_{2}} + ε

(1)

where $V O L$ is the stand volume per hectare (m³ ha⁻¹); $B A$ is the stand basal area per hectare (m² ha⁻¹); $\overset{ˉ}{H}$ is the mean stand height (m); $a_{0}$ , $a_{1}$ , and $a_{2}$ are the regression model parameters; and $ε$ is the error term. According to a previous study (Li and Li Citation2021), Eq. (1) can be written as follows when using LiDAR variables for estimating forest attributes.

(2)

\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε

(2)

where $\hat{y}$ is the forest attribute, which can be VOL, BA, DBH, or H, etc.; P, H, and S are the density variable, height variable, and vertical structure variable extracted from LiDAR data, respectively, and they can be regarded as three groups of LiDAR variables, each group including 1–6 variables.

The 13 above-mentioned LiDAR-derived metrics could be categorized into three groups: the height metric (Ph), the density metric (Pd), and the vertical structure metric group (Pv) (), each group was supposed to capture a different structural aspect of the forest canopy. Based on the forest mensuration and ecological significance of LiDAR variables, and their correlations with the target variables, we specified that Hmean and hp95 were the primary height variables, CC was the primary density variable, and the remaining height and density variables were the secondary variables. The six vertical structure variables were not divided into primary and secondary variables.

Table 3. List of LiDAR-derived metrics used for establishing the prediction model.

Download CSV Display Table

To explore the optimal combinations of variables to estimate the stand volume and basal area based on EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) , we chose 1–2 variables in each variable group and added them to EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) . The selection and combination of the variables are carried out according to the following rules (Li et al. Citation2022):

The variable combination scheme is as follows: 1–2 height variables+1–2 density variables+0–1 vertical structure variable.
The model must comprise at least a primary height variable (Hmean or hp95) and a primary density variable (CC). When two height variables are chosen, only a primary height variable and a secondary height variable can be chosen. When two density variables are chosen, a secondary density variable is chosen next to the CC.
Each variable in the group of vertical structure variables can appear in the model separately.
When the model is composed of two variables, EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) consists of a primary height variable and a primary density variable. When the model comprises three variables, EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) consists of a primary height variable, a primary density variable, and a vertical structure variable. When the model comprises four variables, EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) consists of a primary height variable, a secondary height variable, a primary density variable, and a vertical structure variable. When the model is composed of five variables, EquationEquation (2)(2) $\hat{y} = a_{0} P^{a_{1}} H^{a_{2}} S^{a_{3}} + ε$ (2) consists of a primary height variable, a secondary height variable, a primary density variable, a secondary density variable, and a vertical structure variable.

An exhaustive combination approach was utilized to combine the variables. Finally, 86 formulations consisting of 2–5 variables were obtained, as shown in .

2.3.2. Determining the best local model

Calibration and validation of the models were performed separately for the forest attributes, forest types, and study areas. Each model underwent 50 iterations of calibration and validation. The Gauss-Newton algorithm was used to estimate the model parameters using Python software. For each iteration, 80% of the randomly selected samples from the dataset were used as the calibration dataset, and the remaining 20% were used as the validation dataset. The evaluation statistics for the model include the relative root mean square error (rRMSE) and coefficient of determination (R²). After all 50 iterations, the mean rRMSE and the mean R² were calculated for each model. Repeating the above procedure, all 86 models were calibrated and validated for the forest attribute of a forest type in a study area. The best local model was the one that had the smallest mean rRMSE and the largest mean R² of the validation datasets among all 86 modes. There were 24 best local models for estimating two stand attributes (VOL and BA) of four forest types (Chinese fir, pine, eucalyptus, and broad-leaved forests) in three regions (the eastern, western, and Nanning regions).

2.3.3. Determining the best region-generalized formulation and model

The best region-generalized formulation or model is the formulation or model that has the smallest rRMSE and the largest R² among the region-generalized formulations or models. We proposed two methods: the mean method and the mixed data method, to determine the best region-generalized formulation and model. The difference between the two methods was the determination of the candidate for the best region-generalized formulation, and the remaining methods and steps were the same (). The detailed methods and procedures are as follows.

1) Selection of the candidates for the best region-generalized formulations.

The mean method: For a specific attribute of a specific forest type in three study sites, average the rRMSE and the R² of three models achieved in Section 2.3.2, respectively. The formulation with the smallest mean of rRMSE and the largest mean of R² among all 86 models was the candidate for the best region-generalized formulation.
The mixed data method: The data from three regions of a specific forest type were mixed to calibrate and validate all 86 formulations using a similar method as in Section 2.3.2. The formulation with the smallest rRMSE and the largest R² among all 86 models was the candidate for the best region-generalized formulation.

2) Determining the best region-generalized formulation. Differences ( $Δ r R M S E = \frac{r R M S E_{g} - r R M S E_{l}}{r R M S E_{l}} \times 100$ and $Δ R^{2} = \frac{R_{g}^{2} - R_{l}^{2}}{R_{l}^{2}} \times 100$ ) between the rRMSE and R² of the candidate for the best region-generalized formulation ( $r R M S E_{g}$ and $R_{g}^{2}$ ) and the rRMSE and R² of the best local model ( $r R M S E_{l}$ and $R_{l}^{2}$ ) in three regions were calculated. If the following accuracy criteria are satisfied, the candidate for the best regional-generalized formulation is regarded as the best region-generalized formulation: (a) the differences in rRMSE in all three regions are smaller than 10%, and the differences in R² in all three regions are larger than −10%; and (b) the average difference in rRMSE is smaller than 5%, and the difference in R² is larger than −5% (). Otherwise, the candidate formulation is not the best regional-generalized formulation. Thus, there was no best region-generalized model.

Synthesis selection: If both selection methods yielded a candidate for the best region-generalized formulation, the one with the smallest average difference in rRMSE was the best region-generalized formulation.
Determining the best region-generalized model. Field plots from three regions of a specific forest type were mixed, and 50 repeats of model calibration and validation were performed for the best region-generalized formulation. Given a large number of field plots after data mixing, for each repetition, 70% of randomly selected samples from the mixed dataset were utilized to calibrate the model, and the remaining 30% were used to validate the model. The mean rRMSE and the mean R² of the validation datasets in each region were calculated separately based on the validation data source. The model was confirmed as the best region-generalized model if the differences between the means rRMSE (and mean R²) of the models in three regions and the rRMSE (and mean R²) of the best local model met the accuracy criteria specified in . Otherwise, there was no best region-generalized model.

Table 4. The accuracy criteria for the generalizing formulation and model based on the difference in rRMSE and R² between the candidates for the best region- or type-generalized formulations or models and the best local model.

Download CSV Display Table

2.3.4. Determining the best type-generalized formulation and model

The best type-generalized formulation or model is the formulation or model that has the smallest rRMSE and the largest R² for estimating among the type-generalized formulation or model. The method and procedure for determining the best type-generalized formulation and model are similar to those described in Section 2.3.3 for determining the best region-generalized formulation and model. The main difference is that the former involves not only the study area but also the forest type, and the amount of dataset is much larger than that of the latter, so the methods used to determine the candidates for the best type-generalized formulation are different.

The mean method: For a specific attribute of all four forest types at all three study sites, 12 models were achieved in the above procedures (Section 2.3.2) for each of the 86 formulations. Averaging the rRMSE and the R² of these 12 models, we could thus determine the formulation with the smallest mean of rRMSE and the largest mean of R² as the candidate for the best type-generalized formulation.
The mixed data method: The data from three regions and four forest types were mixed and used to calibrate and validate all 86 formulations (50 repeats). The candidate for the best type-generalized formulation was the one with the smallest mean rRMSE and the largest mean R².

If the differences in the rRMSE (and in the R²) between the candidate for the best type-generalized formulation and the best local models of three regions of four forest types were all smaller than 15% (and larger than −15%) and the average difference in the rRMSE (and in the R²) was smaller than 7.5% (and larger than −7.5%) (as specific in ), the candidate formulation was identified as the best type-generalized formulation. The best type-generalized model was further determined using a procedure similar to that described above.

3. Results

3.1. The best local model for stand attribute prediction

The performances of the 86 VOL estimation formulations varied considerably across the three study sites and the four forest types. For example, in the fir forests, formulations 8, 3, and 21 performed best in the eastern region, while in the western and Nanning regions, the best-performing formulations were formulations 3, 4, and 21, and formulations 27, 28, and 33, respectively. There were seven formulations among the nine best-performing models in the three regions, but none of them were present in all three regions simultaneously. In the pine forests, the three best-performing models were completely different in the three regions. In the eucalyptus and broad-leaved forests, they had seven best-performing formulations in all three regions, but none of them in all three regions simultaneously (). The above results indicated that the performances of all formulations varied widely across regions, even if the forest types were the same.

Table 5. Model validation statistics for the best-performing models for the VOL and BA predictions in the three regions of the four forest types.

Download CSV Display Table

Similar to VOL estimation, the performances of all 86 formulations for BA estimation varied greatly across the regions and forest types, with none ranking among the three best models in all three regions and all forest types simultaneously (). Among the nine best-performing formulations in three regions, there were seven, nine, eight, and seven formulations in the fir, pine, eucalyptus, and broad-leaved forests, respectively.

Further analysis revealed that the difference between the largest rRMSE and the smallest rRMSE did not exceed 10% among the 20 best-performing VOL estimation models in regions and forest types. The differences in R² were similar to those in rRMSE (except for R² of the VOL model for the broad-leaved forests in the eastern region), indicating that the performances of the 20 best-performing VOL estimation models in three regions and four forest types were analogous to each other, although their formulations differed considerably. The performances of the 20 best-performing models for BA estimation in three regions of four forest types were comparable to the performances of the 20 best-performing models for VOL estimation. The best local models for VOL and BA estimation in three study areas and four forest types are shown in , respectively.

Table 6. The best local models for VOL estimation of four forest types in three regions and the best region-generalized model for VOL estimation of four forest types and their goodness-of-fit and validation statistics.

Download CSV Display Table

Table 7. The best local models of BA estimation of four forest types in three study sites and the best region-generalized model of BA estimation of four forest types and their goodness-of-fit and validation statistics.

Download CSV Display Table

3.2. The best region-generalized formulation and model

After averaging the rRMSE and R² of all 86 models for a specific attribute of a specific forest type in three regions, it was found that the performances of all 86 formulations differed greatly from their performances in all three regions. For example, in the VOL estimation of the fir forests, the three best-performing formulations selected by the mean method were 1, 19, and 13, respectively, and model formulation 13 was not among the three best-performing models in all three regions (). Similarly, the performances of all 86 formulations obtained by the mixed data method differed distinctly from their performances in three regions, e.g. the three best-performing formulations in the VOL estimation of the fir forest were formulations 19, 20, and 55, and formulations 20 and 55 were not among the three best-performing models in all three regions. The sixteen candidates for the best region-generalized formulations obtained by two selection methods are shown in . The candidates for the best region-generalized formulations for the VOL and BA estimations obtained by the two selection methods were different, except for the BA estimation in the eucalyptus forests.

Table 8. The number of the candidates for the best region-generalized formulations for the VOL and BA estimations of four forest types selected by the mean method and mixed data method.

Download CSV Display Table

Model formulation 3 was the candidate for the best region-generalized formulation for VOL estimation of the fir forests selected by the mean method. In the eastern region, the rRMSE and R² of this formulation were 18.82% and 0.832, respectively. In the eastern region, the rRMSE and R² of the best local model (model (8)) for the VOL estimation of the fir forests were 18.52% and 0.837, respectively (), and the differences in rRMSE and R² between the two models were 1.62% and −0.60%, respectively. Using the above-mentioned approach, we obtained the differences in rRMSE and R² between the sixteen candidates for the best region-generalized formulations for the VOL and BA estimations of four forest types and the best local models for VOL and BA estimations, as shown in .

Table 9. The difference in rRMSE (δrrmse) and R² (ΔR²) between the sixteen candidates for the best region-generalized formulations for the VOL and BA estimations obtained by two selection methods and the best local models for four forest types in three study sites.

Download CSV Display Table

shows that: (1) the differences in rRMSE between the eight candidates for the best region-generalized formulations of VOL and BA estimations selected by the mean method and the best local models in three regions were all smaller than 10%, except for the pine forests in the western region, for which the difference in the VOL model was 10.26%; (2) the differences in R² between the eight candidates for the best region-generalized formulations of VOL and BA estimations selected by the mean method and the best local models in three regions were all larger than −10%; and (3) the averaged differences in rRMSE in the three regions were smaller than 5% for all four VOL and BA formulations selected by the mean method, and their average differences in R² were larger than −5%. The above result indicated that seven of the eight candidates for the best region-generalized formulation selected by the mean method met the criteria for the region-generalized formulation. The only exception was the VOL estimation model of the pine forests. The performances of the eight candidates for the best region-generalized formulations of VOL and BA estimation selected by the mixed data method were worse than those selected by the mean method, except for the VOL estimation in the fir forests.

Comprehensively considering the performances of all candidate formulations and the fact that the candidate VOL model (model 49) of the pine forest selected by the mean method slightly exceeded the criteria only in the western region, the eight best formulations for VOL and BA estimation were obtained based on two selection approaches, as shown in . The best models for VOL and BA estimations for broad-leaved forests were the same; therefore, there were seven generalized formulations for the VOL and BA estimations in four forest types.

The mixed data from all three regions of the same forest type was used to calibrate and validate the best region-generalized formulations for each forest attribute. The parameter estimates and the goodness-of-fit statistics of the eight best region-generalized models are shown in . The models were referred to as the pending best region-generalized models because their generalizability had not been evaluated.

When calibrating and validating the pending models, the validation datasets were divided into three groups (regions) according to their sources (regions), and the rRMSE and R² of the validation datasets in each region were calculated. The results showed that the variation in rRMSE for the VOL and BA estimation models of the fir, pine, and eucalyptus forests ranged from 16.52% to 23.15%, and the rRMSE of the VOL and BA estimation models of the broad-leaved forest was smaller than 35% (except for the VOL model in the Nanning region) ().

Table 10. The rRMSE and R² of the eight pending best-region-generalized models in three regions of four forest types.

Download CSV Display Table

From , we obtained the differences in rRMSE and R² between the eight pending best region-generalized models and the best local model of each forest type in all regions, as shown in .

Table 11. The differences in rRMSE (δrrmse) and R² (ΔR²) between the eight pending best region-generalized models and the best local models for the stand volume and basal area estimations of four forest types in three regions.

Download CSV Display Table

All eight pending region-generalized models met the criteria for region-generalized models specified in , indicating that these models are generalizable. The region-generalized models shown in could be identified as the region-generalized models for the VOL and BA estimations in three regions. Comparing the model estimates with the field measurements, we found that the model estimates were closer to the 1:1 line (), demonstrating that the region-generalized models provided excellent predictions.

Figure 3. Scatterplots of survey VOL versus predicted VOL (a, c, e, g) and survey BA versus predicted BA (b, d, f, h) of the best region-generalized model of validation data for four forest types. The black, blue, and red dots the indicate validation data from the Eastern, Western, and Nanning regions, respectively.

In addition, shows that the predictions of the two forest attributes of the eucalyptus forest, which had a simple stand structure, were slightly better than those for other forest types. Since the study area was too large to demonstrate the differences in forest inventory attribute estimates, we selected a county for mapping the forest attributes, and the results showed that the estimates of the region-generalized models are very close to those of the best local models ().

Figure 4. Wall-to-wall maps of the stand volume and basal area generated by the models of the pine forest in a selected area of 4550 km²: (a) and (b) are the estimates of the VOL and BA predicted by the region-generalized models, respectively; (c) and (d) are the estimates of the VOL and BA predicted with the best locale models in the Western region, respectively; (e) is the geographic location of the selected area in the study area.

Figure 4. Wall-to-wall maps of the stand volume and basal area generated by the models of the pine forest in a selected area of 4550 km2: (a) and (b) are the estimates of the VOL and BA predicted by the region-generalized models, respectively; (c) and (d) are the estimates of the VOL and BA predicted with the best locale models in the Western region, respectively; (e) is the geographic location of the selected area in the study area.

After performing a logarithmic transformation, eight linear models were built using the same variables as the eight region-generalized multiplicative power models, and the standardized coefficients of the models are shown in .

Table 12. Standardized coefficients of the linear regression (logarithmic conversion) of the best region-generalized model of VOL and BA estimations of four forest types.

Download CSV Display Table

Overall, the primary height variables (Hmean and hp95) contributed the most to the target variables, followed by the density variables (sometimes, the secondary density variables contributed more than the primary density variables), and the vertical structure variables and secondary height variables contributed slightly less.

3.3. The performances of the best type-generalized formulations

The determination of the best type-generalized formulations of a specific attribute involved 12 datasets from three study sites and four forest types. For each attribute, 12 models of each of the 86 formulations were obtained in Section 2.3.1. After averaging and sorting the rRMSE and R² of these 12 models, we determined that formulations 57 and 61 were the candidates for the best type-generalized formulations for VOL and BA estimations selected by the mean method. For the maxing data methods, formulations 52 and 67 were the candidates for the best type-generalized formulations for VOL and BA estimations. The differences in rRMSE and R² between these candidates for the best type-generalized formulations for VOL and BA estimation and the best local models in the three regions and four forest types are shown in .

Table 13. The differences in rRMSE (δrrmse) and R² (ΔR²) between the eight candidates for the best type-generalized formulation of the VOL and BA estimations selected by two selection methods and the best local models in three regions of four forest types.

Download CSV Display Table

Calibrating models 57 and 61 using all the data from four forest types and three regions, the results were that, for the VOL estimation, the rRMSE and R² were 25.56% and 0.562, respectively; for the BA estimation, the rRMSE and R² were 28.78% and 0.489, respectively.

The differences in rRMSE and R² between the candidates for the best type-generalized formulations and the best local models were significantly larger than those of the best region-generalized formulations and models (). For both the VOL and BA predictive models, there were always cases in which some forest types had obvious increases in rRMSE or obvious decreases in R² in some regions. The largest differences in rRMSE between the two candidate VOL formulations and the best local VOL models were 15.12% and 17.20%, respectively. The largest differences (absolute value) in R² between the two candidate BA formulations and the best local models were 18.71% and 15.34%, respectively (). Since the increase in errors (rRMSE) of the two candidate formulations and the decrease in the explanatory power (R²) of the model variables were considerably larger than those of the best local model, the criteria for the best type-generalized formulations specified in were not satisfied. Hence, there were no best type-generalized formulations, and thus the best type-generalized models were not found.

4. Discussion

A generalized model or model formulation would facilitate airborne LiDAR-based estimation of forest attributes across regions and forest types (or species), which would not only be beneficial for the consistent estimation of forest inventory attributes but also help to reduce the number of field plots and the cost of field measurements. The novelties of this study are the accuracy criteria to determine the region- and type-generalized formulations and models for estimating the stand volume and basal area in subtropical forests, and the procedures for determining the generalized formulations and models.

4.1. Accuracy criteria for determining the generalized formulations and models

Consistent with the findings of most studies, the accuracy of the best local models of the sample forest type differed among the three regions (); we suppose the possible reason is that the difference in the 3D structure of the forest canopy and the variation of the target variables vary in different regions (). Therefore, the determination of the generalized model needs to take into account the differences in the structure of forest stands in different forest types and regions, and it is necessary to maintain relatively high accuracy in all regions and all forest types instead of just considering the overall accuracy of the generalizing model. As the study area increases, the accuracy of the generalizing model is generally lower than that of the best local model (); hence, the generalization of the predictive model requires us to make a trade-off between the estimation accuracy and the consistency of the estimate across study sites and forest types, as shown in the study by Bouvier et al. (Citation2015). Some scholars have worked on developing generalized models for airborne LiDAR-based forest attribute estimation to achieve consistent estimates. However, to the best of our knowledge, no accuracy criteria have been proposed for the generalized formulation and model. Since there are four classes of generalized models, we have established four accuracy criteria. These accuracy criteria were based on the rRMSE and R² of the best local model, which took into account the differences in the characteristics of different forest types in different regions. We anticipated that the resulting generalized formulations or models would be highly accurate across all study sites or forest types. Otherwise, the generalized formulations or models would be meaningless.

Model accuracy is affected by numerous factors, such as the extent of the study area, heterogeneity of the forest, sampling method, number and area of the plots, modeling method, and so on (Zolkos, Goetz, and Dubayah Citation2013; Coops et al. Citation2021), among which the number of sample plots (sample size) is an essential factor. Normally, model accuracy gradually increases as the sample sizes increases (Gobakken and Næsset Citation2008; da Silva et al. Citation2020). However, the sample size required for forest attribute estimation varies by forest type, the extent of the study site, sampling, and modeling approach (Næsset Citation2014). In this study, the sample sizes for all forest types in most regions were less than 100; therefore, whether the accuracy of the best local model reached stability or not, additional study is needed.

Our study showed that it is difficult to obtain the best type-generalized formulations and models for the VOL and BA estimations because the accuracy of the candidates for the best type-generalized formulations was significantly lower than that of the best local models in some regions, and did not met the accuracy criteria of the type-generalized formulation. This finding differed from those of some studies (Bouvier et al. Citation2015; Knapp et al. Citation2020). The major reason is that none of the previous studies proposed an accuracy criterion for the generalized model, but only determined its generalizability with the rRMSE and R². In our study, the rRMSE and R² of the pending type-generalized formulation for the VOL estimation were 25.56% and 0.562, respectively. Although the statistics were not too low, the rRMSE was significantly larger than that of the best local models for some forest types in some regions, and the opposite was true for the R². Therefore, we believed that these two pending models had poor generalization capabilities and limited practical value. From the perspective of forest mensuration and ecology, this may be attributed to the various biological and ecological characteristics of different tree species, they lead to the 3D structures of the canopy varying considerably among forest types, e.g. most eucalyptus forests are single-storied pure forests, most broad-leaved forests are multi-storied mixed forests, and the fir and pine forests consist of both single-storied pure forests and multi-storied mixed forests. In addition, since the stem- and stand-form factors vary with tree species, the allometric equations for calculating the stem and stand volumes also vary considerably. Therefore, a small set of LiDAR variables can not accurately characterize the diverse canopy structures of the different forest types. As a result, there is no best type-generalized formulation. One potential solution is to stratify the vertical structure of the forest canopy and then estimate the forest attributes separately by stratum (Zhou et al. Citation2022; Yu et al. Citation2022). Perhaps adding 1–2 LiDAR variables to the model is also an alternative solution.

4.2. Accuracy of generalized model

Only a few studies on airborne LiDAR-based forest attribute estimation in subtropical forests have been reported, and all of them have small study areas. In two studies conducted in planted forests in Guangxi Province (with an area of 5000 ha) and Jiangsu Province (1260 ha), China, the rRMSEs for estimating the stand volume using multiple linear regression models were 21.34% and 16.47%, respectively (Liu et al. Citation2021; Zhang, Cao, and She Citation2017). The results of these two studies are not comparable to this study due to the small extent and homogeneity of the forests. In some studies conducted in temperate forests larger than 5,000 km², the estimation accuracy of forest inventory attributes using parametric models varied considerably, with rRMSE ranging from 11.04% to 46.7% for the stand volume and 13.8% to 37.1% for the basal area (Hauglin et al. Citation2021; Hill, Buddenbaum, and Mandallaz Citation2018; Nilsson et al. Citation2017; Watt and Watt Citation2013; Treitz et al. Citation2012; Nord-Larsen and Schumacher Citation2012; Dalponte et al. Citation2011; Woods, Lim, and Treitz Citation2008). In 34 studies worldwide, the mean residual standard error (RSE) of discrete airborne LiDAR-based aboveground biomass estimation was 27% (Zolkos, Goetz, and Dubayah Citation2013). The comparability of model accuracy between different studies is poor due to the influence of various factors mentioned earlier. In this study, the rRMSEs of the regional-generalized models used for stand volume estimation were 22.65%, 19.88%, 19.16%, and 36.22% for the fir, pine, eucalyptus, and broad-leaved forests, respectively, although they were slightly lower than the best local models. Given the large extent of the study area and the heterogeneity of forest structures in different regions, we believe that the accuracies of the regional-generalized models in this study are achieved as desired.

4.3. Interpretability of the model variables

Numerous LiDAR-derived metrics can be extracted from LiDAR point clouds and CHM. All these metrics have explicit statistical meaning and characterize aspects of the 3D structure of the forest canopy. However, some of them, such as the 5% and 10% percentile heights, the 0.15–1.37 m echo ratio (Fekety, Falkowski, and Hudak Citation2015), and the 10% and 99% density percentiles (Xu, Manley, and Morgenroth Citation2018), have no clear meanings in forest mensuration or ecology. Although these metrics are closely correlated with some specific attributes for certain forest types in certain regions and can be used for forest attribute estimation, these models are poorly interpretable. The 13 LiDAR-derived metrics selected for this study have clear forest mensuration and ecological significance, and accurately depict the 3D structure of the forest canopy in terms of height, density, and vertical structure; thus, the constructed models had excellent interpretability. In addition, the correlation coefficients between these 13 variables are small, effectively reducing the multicollinearity of the independent variables in the model. There are various approaches to selecting model variables (Montealegre et al. Citation2016; Luo et al. Citation2018; van Ewijk et al. Citation2019; Silva et al. Citation2017; Sheridan et al. Citation2015; Zhang, Cao, and She Citation2017; Cao et al. Citation2019), all of which have advantages and limitations. In this study, the rule-based exhaustive combination of 13 LiDAR variables not only enabled the high accuracy of the resulting model but also made the model interpretable in terms of forest mensuration and ecology. Although the calibration and validation of 86 models were computationally intensive (this is no longer an issue today due to the powerful computing power), it had the advantage of achieving the best combination of LiDAR-derived variables suitable for different forest attribute estimations for different forest types (i.e. different canopy structures), and the model variables had excellent explanatory power.

The vertical structure variables are calculated in this paper in terms of gap fractions using the Beer-Lambert law. The clumping effect of leaves might increase the gap probability and thus lead to an estimate of the leaf area index (LAI) being smaller than the actual LAI (Hu et al. Citation2018). Therefore, it is necessary to investigate the physical models based on stochastic radiative transfer theory, such as the model of Detto et al. (Citation2015), to extract the vertical structure information of forest stands more accurately.

The predictors of the best local models varied significantly across study sites and forest types. The major reason may be the differences in the 3D structure of the forest canopy among diverse forest types and diverse study sites. Therefore, future work will focus on classifying forest canopy structures using LiDAR data (Zhou et al. Citation2022) and then developing generalized formulations and models for different canopy structures (single-, double-, and multi-storied) (Yu et al. Citation2022), to further improve model generalization and estimation accuracy while maintaining consistent estimation of forest attributes. Furthermore, despite this study covering a whole province, the extent of the distribution of major tree species is broad regardless of the biogeographic region; e.g. the distribution of Cunninghamia lanceolata and Pinus massoniana cover approximately half of China. Therefore, the adaptability of the region-generalized model needs to be validated in a larger region than a province.

5. Conclusion

We obtained 86 formulations in this study for estimating the stand volume and basal area using a rule-based exhaustive combination of 13 LiDAR-derived metrics that characterized the 3D structure of the forest canopy and had explicit forest mensuration and ecological significance. Then, we obtained the 24 best local models by calibrating and validating these formulations using data from four forest types in the three study sites. We further proposed a set of accuracy criteria for determing the generalized formulations and models based on these models. Finally, by applying two selection methods based on the rRMSE and R2, we achieved the eight best region-generalized formulations and models. However, since the 3D structure of the canopy varies considerably among forest types, it was difficult to achieve type-generalized formulations and models. The accuracy criteria and procedures presented in this paper would be helpful in the development of generalized models for the consistent estimation of forest inventory attributes using airborne LiDAR data. However, the accuracy criteria still need to be further validated in a larger region than a province. Furthermore, additional stratification of forest types (or species) based on the vertical structure is required to obtain stratum-generalized formulations and models for forest attribute estimations that are consistent across forest types and study sites.

CRediT authorship contribution statement

Chungan Li: Conceptualization, Methodology, Calculation, Data collection and analysis, Writing – original draft and – Review and editing, Project administration, Funding acquisition. Zhongchao Chen:and Xiangbei Zhou Calculation, Writing – original draft. Mei Zhou and Zhen Li: Data collection and analysis, Calculation.

Acknowledgement

This project is a part of the Fifth Forest Management Inventory Project of Guangxi Zhuang Autonomous Region (5^th FMI-GX, 2017-2020), China. The airborne LiDAR data acquisition and prepossessing and field data measurement were founded by the finance department of the Guangxi Zhuang Autonomous Region. The authors would like to express their sincere gratitude to Chengling Yang and Yao Liang from the Guangxi Forest Inventory and Planning Institute (FIPI-GX) and 120 field crews who worked on the field measurements. The authors also thank Guangxi 3D Remote Sensing Engineering Technology Co., Ltd., Feiyan Aero Remote Sensing Technology Co., Ltd., Zhongke Remote Sensing Technology Group Co., Ltd., and Guangzhou Jiantong Surveying and Mapping Geographic Information Technology Co., Ltd., which were responsible for the acquisition and preprocessing of the airborne LiDAR data utilized herein. In addition, the authors are grateful to Professor Guofan Shao from the Department of Forestry and Natural Resources of Purdue University in West Laffayette, IN, USA, and Dr. Marc Bouvier from UMR TETIS Irstea-Cirad-AgroParisTech/ENGREF, Maison de la Télédétection en Languedoc-Roussillon, 500, rue J.F. Breton BP 5095, 34196 Montpellier Cedex 05, France, for their advice on the name of the generalization model. We acknowledge the anonymous reviewers and the editor for their insightful suggestions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available at https://doi.org/10.57760/sciencedb.j00001.00467

Additional information

Funding

This work received financial support from the Forest Department of Guangxi Zhuang Autonomous Region, China [GXLYKJ2016-001].

References

Asner, G. P., J. K. Clark, J. Mascaro, G. A. Galindo García, K. D. Chadwick, D. A. Navarrete Encinales, G. Paez-Acosta, et al. 2012. “High-Resolution Mapping of Forest Carbon Stocks in the Colombian Amazon.” Biogeosciences Discussions 9 (7): 2445–27. doi:10.5194/bg-9-2683-2012.
Web of Science ®Google Scholar
Asner, G. P., J. Mascaro, H. C. Muller-Landau, G. Vieilledent, R. Vaudry, M. Rasamoelina, J. S. Hall, and M. van Breugel. 2012. “A Universal Airborne LiDar Approach for Tropical Forest Carbon Mapping.” Oecologia 168 (4): 1147–1160. doi:10.1007/s00442-011-2165-z.
PubMed Web of Science ®Google Scholar
Bouvier, M., S. Durrieu, R. A. Fournier, and J. -P. Renaud. 2015. “Generalizing Predictive Models of Forest Inventory Attributes Using an Area-Based Approach with Airborne LiDar Data.” Remote Sensing of Environment 156: 322–334. doi:10.1016/j.rse.2014.10.004.
Web of Science ®Google Scholar
Cao, L., H. Liu, X. Fu, Z. Zhang, Z. Shen, and H. Ruan. 2019. “Comparison of UAV LiDar and Digital Aerial Photogrammetry Point Clouds for Estimating Forest Structural Attributes in Subtropical Planted Forests.” Forests 10 (2): 145. doi:10.3390/f10020145.
Web of Science ®Google Scholar
Coomes, D. A., M. Dalponte, T. Jucker, G. P. Asner, L. F. Banin, D. F. R. P. Burslem, S. L. Lewis, et al. 2017. “Area-Based Vs Tree-Centric Approaches to Mapping Forest Carbon in Southeast Asian Forests from Airborne Laser Scanning Data.” Remote Sensing of Environment 194: 77–88. doi:10.1016/j.rse.2017.03.017.
Web of Science ®Google Scholar
Coops, N. C., P. Tompalski, T. R. H. Goodbody, M. Queinnec, J. E. Luther, D. K. Bolton, J. C. White, M. A. Wulder, O. R. van Lier, and T. Hermosilla. 2021. “Modelling Lidar-Derived Estimates of Forest Attributes Over Space and Time: A Review of Approaches and Future Trends.” Remote Sensing of Environment 260: 112477. doi:10.1016/j.rse.2021.112477.
Web of Science ®Google Scholar
Dalponte, M., C. Martinez, M. Rodeghiero, and D. Gianelle. 2011. “The Role of Ground Reference Data Collection in the Prediction of Stem Volume with LiDar Data in Mountain Areas.” Isprs Journal of Photogrammetry and Remote Sensing 66: 787–797. doi:10.1016/j.isprsjprs.2011.09.003.
Web of Science ®Google Scholar
da Silva, V. S., C. A. S. Ilva, M. Mohan, A. Cardil, F. E. Rex, G. H. Loureiro, D. R. A. de Almeida, et al. 2020. “Combined Impact of Sample Size and Modeling Approaches for Predicting Stem Volume in Eucalyptus Spp. Forest Plantations Using Field and LiDar Data.” Remote Sensing 12 (9): 1438. doi:10.3390/rs12091438.
Web of Science ®Google Scholar
Detto, M., G. P. Asner, H. C. Muller-Landau, and O. Sonnentag. 2015. “Spatial Variability in Tropical Forest Leaf Area Density from Multireturn Lidar and Modeling.” Journal of Geophysical Research-Biogeoscience 120 (2): 294–309. doi:10.1002/2014JG002774.
Web of Science ®Google Scholar
Ene, L. T., E. Næsset, T. Gobakken, T. G. Gregoire, G. Ståhl, and R. Nelson. 2012. “Assessing the Accuracy of Regional LiDar-Based Biomass Estimation Using a Simulation Approach.” Remote Sensing of Environment 123: 579–592. doi:10.1016/j.rse.2012.04.017.
Web of Science ®Google Scholar
Fekety, P. A., M. J. Falkowski, and A. Hudak. 2015. “Temporal Transferability of LiDar-Based Imputation of Forest Inventory Attributes.” Canadian Journal of Forest Research 45: 422–435. doi:10.1139/cjfr-2014-0405.
Web of Science ®Google Scholar
Gajardo, J., M. García, and D. Riaño. 2014. “Chapter 22 Applications of Airborne Laser Scanning in Forest Fuel Assessment and Fire Prevention.” In Forestry Applications of Airborne Laser Scanning: Concepts and Case Studies, Managing Forest Ecosystems, edited by M. Maltamo, E. Næsset, and J. Vauhkonen, Vol. 27, 439–462. © Springer ScienceCBusiness Media Dordrecht. doi:10.1007/978-94-017-8663-81.
Google Scholar
Gao, Y., D. Lu, G. Li, G. Wang, Q. Chen, L. Liu, and D. Li. 2018. “Comparative Analysis of Modeling Algorithms for Forest Aboveground Biomass Estimation in a Subtropical Region.” Remote Sensing 10 (4): 627. doi:10.3390/rs10040627.
Web of Science ®Google Scholar
García-Gutiérrez, J., E. Gonzalez-Ferreiro, J. C. Riquelme-Santos, D. Miranda, U. Dieguez-Aranda, and R. M. Navarro-Cerrillo. 2014. “Evolutionary Feature Selection to Estimate Forest Stand Variables Using LiDar.” International Journal of Applied Earth Observation and Geoinformation 26: 119–131. doi:10.1016/j.jag.2013.06.005.
Web of Science ®Google Scholar
García-Gutiérrez, J., F. Martínez-Álvarez, A. Troncoso, and J. C. Riquelme. 2015. “A Comparison of Machine Learning Regression Techniques for LiDar-Derived Estimation of Forest Variables.” Neurocomputing 167: 24–31. doi:10.1016/j.neucom.2014.09.091.
Web of Science ®Google Scholar
Getzin, S., R. Fischer, N. Knapp, and A. Huth. 2017. “Using Airborne LiDar to Assess Spatial Heterogeneity in Forest Structure on Mount Kilimanjaro.” Landscape Ecology 32: 1881–1894. doi:10.1007/s10980-017-0550-7.
Web of Science ®Google Scholar
Giannico, V., R. Lafortezza, R. John, G. Sanesi, L. Pesola, and J. Chen. 2016. “Estimating Stand Volume and Above-Ground Biomass of Urban Forests Using LiDar.” Remote Sensing 8: 339. doi:10.3390/rs8040339.
Web of Science ®Google Scholar
Gleason, C. J., and J. Im. 2012. “Forest Biomass Estimation from Airborne LiDar Data Using Machine Learning Approaches.” Remote Sensing of Environment 125: 80–91. doi:10.1016/j.rse.2012.07.006.
Web of Science ®Google Scholar
Gobakken, T., and E. Næsset. 2008. “Assessing Effects of Laser Point Density, Ground Sampling Intensity, and Field Sample Plot Size on Biophysical Stand Properties Derived from Airborne Laser Scanner Data.” Canadian Journal of Forest Research 38: 1095–1109. doi:10.1139/X07-219.
Web of Science ®Google Scholar
Görgens, E. B., P. Packalen, A. G. P. da Silva, C. A. Alvares, O. C. Campoe, J. L. Stape, and L. C. E. Rodriguez. 2015. “Stand Volume Models Based on Stable Metrics as from Multiple ALS Acquisitions in Eucalyptus Plantations.” Annals of Forest Science 72 (4): 489–498. doi:10.1007/s13595-015-0457-x.
Web of Science ®Google Scholar
Hamedianfar, A., C. Mohamedou, A. Kangas, and J. Vauhkonen. 2022. “Deep Learning for Forest Inventory and Planning: A critical Review on the Remote Sensing Approaches so Far and Prospects for Further Applications.” Forestry 95 (4): 451–465. doi:10.1093/forestry/cpac002.
Google Scholar
Hauglin, M., J. Rahlf, J. Schumacher, R. Astrup, and J. Breidenbach. 2021. “Large Scale Mapping of Forest Attributes Using Heterogeneous Sets of Airborne Laser Scanning and National Forest Inventory Data.” Forest Ecosystems 8: 65. doi:10.1186/s40663-021-00338-4.
Web of Science ®Google Scholar
Hawkins, D. M. 2004. “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences 44: 1–12. doi:10.1021/ci0342472.
PubMedGoogle Scholar
Hill, A., H. Buddenbaum, and D. Mandallaz. 2018. “Combining Canopy Height and Tree Species Map Information for Large‑scale Timber Volume Estimations Under Strong Heterogeneity of Auxiliary Data and Variable Sample Plot Sizes.” European Journal of Forest Research 137 (4): 489–505. doi:10.1007/s10342-018-1118-z.
Web of Science ®Google Scholar
Hollaus, M., W. Wagner, K. Schadauer, B. Maier, and K. Gabler. 2009. “Growing Stock Estimation for Alpine Forests in Austria: A Robust Lidar-Based Approach.” Canadian Journal of Forest Research 39: 1387–1400. doi:10.1139/X09-042.
Web of Science ®Google Scholar
Hopkinson, C., J. Lovell, L. Chasmer, D. Jupp, N. Kljun, and E. van Gorsel. 2013. “Integrating Terrestrial and Airborne Lidar to Calibrate a 3D Canopy Model of Effective Leaf Area Index.” Remote Sensing of Environment 136: 301–314. doi:10.1016/j.rse.2013.05.012.
Web of Science ®Google Scholar
Huang, Z., Z. Huang, C. Zeng, and Z. Huang. 1993. “The Annual Growth Rhythms of Tongmain Pine, a Geographic Seed Source of Pinus Massoniana.” Journal of Central-South Forestry College 13 (1): 98–102. in Chinese. doi:10.14067/j.cnki.1673-923x.1993.01.02.
Google Scholar
Huang, W., B. Zhou, J. Zhao, and C. Wang. 1996. “Study on Annual Growth Rhythm of Fir Stands with Different Silvicultural Density and Its Relationship with Meteorological Factors.” Forestry Science and Technology of Henan Province 2: 4–11. in Chinese.
Google Scholar
Hudak, A. T., N. L. Crookston, J. S. Evans, D. E. Hall, and M. J. Falkowski. 2008. “Nearest Neighbour Imputation of Species-Level, Plot-Scale Forest Structure Attributes from LiDar Data.” Remote Sensing of Environment 112 (5): 2232–2245. doi:10.1016/j.rse.2007.10.009.
Web of Science ®Google Scholar
Hu, R., G. Yan, F. Nerry, Y. Liu, Y. Jiang, S. Wang, Y. Chen, X. Mu, W. Zhang, and D. Xie. 2018. “Using Airborne Laser Scanner and Path Length Distribution Model to Quantify Clumping Effect and Estimate Leaf Area Index.” IEEE Transactions on Geoscience and Remote Sensing 56 (6): 3196–3209. doi:10.1109/TGRS.2018.2794504.
Web of Science ®Google Scholar
Kim, E., W. -K. Lee, M. Yoon, J. -Y. Lee, Y. Son, and K. A. Salim. 2016. “Estimation of Voxel-Based Above-Ground Biomass Using Airborne LiDar Data in an Intact Tropical Rain Forest, Brunei.” Forests 7: 259. doi:10.3390/f7110259.
Web of Science ®Google Scholar
Knapp, N., R. Fischer, V. Cazcarra-Bes, and A. Huth. 2020. “Structure Metrics to Generalize Biomass Estimation from Lidar Across Forest types from Different Continents.” Remote Sensing of Environment 237: 111597. doi:10.1016/j.rse.2019.111597.
Web of Science ®Google Scholar
Latifi, H., E. Fabian, F. E. Fassnacht, J. Müller, A. Tharani, S. Dech, and M. Heurich. 2015. “Forest Inventories by LiDar Data: A Comparison of Single Tree Segmentation and Metric-Based Methods for Inventories of a Heterogeneous Temperate Forest.” International Journal of Applied Earth Observation and Geoinformation 42: 162–174. doi:10.1016/j.jag.2015.06.008.
Web of Science ®Google Scholar
Lefsky, M. A., W. B. Cohen, and T. A. Spies. 2001. “An Evaluation of Alternate Remote Sensing Products for Forest Inventory, Monitoring, and Mapping of Douglas-Fir Forests in Western Oregon.” Canadian Journal of Forest Research 31 (1): 78–87. doi:10.1139/x00-142.
Web of Science ®Google Scholar
Liao, Z., and D. Huang. 1986. “Forest Inventory Handbook of Guangxi, China.” Pages 14-15. Forestry Department of Guangxi Zhuang Autonomous Region. Nanning, Guanxi, China. (in Chinese): Nanning.
Google Scholar
Li, C., and Z. Li. 2021. “Generalizing Predictive Models of Sub-Tropical Forest Inventory Attributes Using an Area-Based Approach with Airborne LiDar Data.” Scientia Silvae Sinicae 57: 23–35. In Chinese. doi:10.11707/j.1001-7488.20211003.
Google Scholar
Liu, H., X. Shen, L. Cao, T. Yun, Z. Zhang, X. Fu, X. Chen, and F. Liu. 2021. “Deep Learning in Forest Structural Parameter Estimation Using Airborne LiDar Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14. doi:10.1109/JSTARS.2020.3046053.
Web of Science ®Google Scholar
Li, C., Z. Yu, X. Zhou, M. Zhou, and Z. Li. 2022. “Using the Error-In-Variable Simultaneous Equations Approach to Construct Compatible Estimation Models of Forest Inventory Attributes Based on Airborne LiDar.” Forests 14: 65. doi:10.3390/f14010065.
Web of Science ®Google Scholar
Lovell, J. L., D. L. Jupp, D. S. Culvenor, and N. C. Coops. 2003. “Using Airborne and Ground-Based Ranging Lidar to Measure Canopy Structure in Australian Forests.” Canadian Journal of Remote Sensing 29 (5): 607–622. doi:10.5589/m03-026.
Web of Science ®Google Scholar
Luo, S., J. M. Chen, C. Wang, A. Gonsamo, X. Xi, Y. Lin, M. Qian, et al. 2018. “Comparative Performances of Airborne LiDar Height and Intensity Data for Leaf Area Index Estimation.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (1): 300–310. doi:10.1109/JSTARS.2017.2765890.
Web of Science ®Google Scholar
Matasci, G., T. Hermosilla, M. A. Wulder, J. C. White, N. G. Coops, G. W. Hobart, and H. S. J. Zald. 2018. “Large-Area Mapping of Canadian Boreal Forest Cover, Height, Biomass and Other Structural Attributes Using Landsat Composites and Lidar Plots.” Remote Sensing of Environment 209: 90–106. doi:10.1016/j.rse.2017.12.020.
Web of Science ®Google Scholar
Means, J. E., S. A. Acker, B. J. Fitt, M. Renslow, L. Emerson, and C. J. Hendrix. 2000. “Predicting Forest Stand Characteristics with Airborne Scanning Lidar.” Photogrammetric Engineering and Remote Sensing 66 (11): 1367–1371.
Web of Science ®Google Scholar
Millie, D. F., G. R. Weckman, W. A. Young, J. E. Ivey, H. J. Carrick, and G. L. Fahnenstiel. 2012. “Modeling Microalgal Abundance with Artificial Neural Networks: Demonstration of a Heuristic ‘Grey-Box’ to Deconvolve and Quantify Environmentalinfluences.” Environmental Modelling & Software 38: 27–39. doi:10.1016/j.envsoft.2012.04.009.
Web of Science ®Google Scholar
Montealegre, A. L., M. T. Lamelas, J. de la Riva, A. García-Martín, and F. Escribano. 2016. “Use of Low Point Density ALS Data to Estimate Stand-Level Structural Metrics in Mediterranean Aleppo Pine Forest.” Forestry 89 (4): 373–382. doi:10.1093/forestry/cpw008.
Web of Science ®Google Scholar
Næsset, E. 2014. “Chapter 11 Area-Based Based Inventory Inventory in Norway–From from Innovation Innovation to an Operational Operational Realityreality.” In Forestry Applications of Airborne Laser Scanning: Concepts and Case Studies, Managing Forest Ecosystems, edited by M. Maltamo, E. Næsset, and J. Vauhkonen, Vol. 27, 215–240. © Springer ScienceCBusiness Media Dordrecht. doi:10.1007/978-94-017-8663-81.
Google Scholar
Næsset, E., O. M. Bollandsas, T. Gobakken, and O. M. Bollandsá. 2005. “Comparing Regression Methods in Estimation of Biophysical Properties of Forest Stands from Two Different Inventories Using Laser Scanner Data.” Remote Sensing of Environment 94 (4): 541–553. doi:10.1016/j.rse.2004.11.010.
Web of Science ®Google Scholar
Næsset, E., T. Gobakken, J. Holmgren, H. Hyyppä, J. Hyyppä, M. Maltamo, M. Nilsson, H. Olsson, Å. Persson, and U. Söderman. 2004. “Laser Scanning of Forest Resources: The Nordic Experience.” Scandinavian Journal of Forest Research 19 (6): 482–499. doi:10.1080/02827580410019553.
Web of Science ®Google Scholar
Nilsson, M., K. Nordkvist, J. Jonzén, N. Lindgren, P. Axensten, J. Wallerman, M. Egberth, et al. 2017. “A Nationwide Forest Attribute Map of Sweden Predicted Using Airborne Laser Scanning Data and Field Data from the National Forest Inventory.” Remote Sensing of Environment 194: 447–454. doi:10.1016/j.rse.2016.10.022.
Web of Science ®Google Scholar
Nord-Larsen, T., and J. Schumacher. 2012. “Estimation of Forest Resources from a Country Wide Laser Scanning Survey and National Forest Inventory Data.” Remote Sensing of Environment 119: 148–157. doi:10.1016/j.rse.2011.12.022.
Web of Science ®Google Scholar
Novo-Fernández, A., M. Barrio-Anta, C. Recondo, A. Cámara-Obregón, and C. A. López-Sánchez. 2019. “Integration of National Forest Inventory and Nationwide Airborne Laser Scanning Data to Improve Forest Yield Predictions in North-Western Spain.” Remote Sensing 11: 1693. doi:10.3390/rs11141693.
Web of Science ®Google Scholar
Pearse, G. D., J. Morgenroth, M. S. Watt, and J. P. Dash. 2017. “Optimising Prediction of Forest Leaf Area Index from Discrete Airborne Lidar.” Remote Sensing of Environment 200: 220–239. doi:10.1016/j.rse.2017.08.002.
Web of Science ®Google Scholar
Penner, M., D. G. Pitt, and M. E. Woods. 2013. “Parametric Vs. Nonparametric LiDar Models for Operational Forest Inventory in Boreal Ontario.” Canadian Journal of Remote Sensing 39 (5): 426–443. doi:10.5589/m13-049.
Web of Science ®Google Scholar
Sheridan, R. D., S. C. Popescu, D. Gatziolis, C. L. S. Morgan, and N. -W. Ku. 2015. “Modeling Forest Aboveground Biomass and Volume Using Airborne LiDar Metrics and Forest Inventory and Analysis Data in the Pacific Northwest.” Remote Sensing 7 (1): 229–255. doi:10.3390/rs70100229.
Web of Science ®Google Scholar
Silva, C. A., A. T. Hudak, C. Klauberg, L. A. Vierling, C. Gonzalez‑benecke, S. de Padua Chaves Carvalho, L. C. E. Rodrihue, and A. Cardil. 2017. “Combined Effect of Pulse Density and Grid Cell Size on Predicting and Mapping Aboveground Carbon in Fast‑growing Eucalyptus Forest Plantation Using Airborne LiDar Data.” Carbon Balance and Management 12 (1): 13. doi:10.1186/s13021-017-0081-1.
PubMedGoogle Scholar
Silva, C. A., C. Klauberg, A. T. Hubdak, L. A. Vierling, S. J. Fennema, and P. D. Corte. 2017. “Modeling and Mapping Basal Area of Pinus Taeda L. Plantation Using Airborne LiDar Data.” Anais da Academia Brasileira de Ciencias 89 (3): 1895–1905. doi:10.1590/0001-3765201720160324.
PubMed Web of Science ®Google Scholar
Singh, K. K., G. Chen, J. B. McCarter, and R. K. Meentemeyer. 2015. “Effects of LiDar Point Density and Landscape Context on Estimates of Urban Forest Biomass.” Isprs Journal of Photogrammetry and Remote Sensing 101: 310–322. doi:10.1016/j.isprsjprs.2014.12.021.
Web of Science ®Google Scholar
Treitz, P., K. Lim, M. Woods, D. Pitt, D. Nesbitt, and D. Etheridge. 2012. “LiDar Sampling Density for Forest Resource Inventories in Ontario, Canada.” Remote Sensing 4: 830–848. doi:10.3390/rs4040830.
Web of Science ®Google Scholar
van Ewijk, K., P. Treitz, M. Woods, T. Jones, and J. Caspersen. 2019. “Forest Site and Type Variability in ALS-Based Forest Resource Inventory Attribute Predictions Over Three Ontario Forest Sites.” Forests 10: 226. doi:10.3390/f10030226.
Web of Science ®Google Scholar
Vauhkonen, J., M. Maltamo, R. E. McRoberts, and E. Næsset. 2014. “Introduction to Forestry Applications of Airborne Laser Scanning.” In Forestry Applications of Airborne Laser Scanning: Concepts and Case Studies, Managing Forest Ecosystems, edited by M. Maltamo, E. Næsset, and J. Vauhkonen, 27. © Springer ScienceCBusiness Media Dordrecht. doi:10.1007/978-94-017-8663-81.
Google Scholar
Watt, M. S., J. P. Dash, S. Bhandari, and P. Watt. 2015. “Comparing Parametric and Nonparametric Methods of Predicting Site Index for Radiata Pine Using Combinations of Data Derived from Environmental Surfaces, Satellite Imagery and Airborne Laser Scanning.” Forest Ecology and Management 357: 1–9. doi:10.1016/j.foreco.2015.08.001.
Web of Science ®Google Scholar
Watt, P., and M. S. Watt. 2013. “Development of a National Model of Pinus Radiate Stand Volume from Lidar Metrics for New Zealand.” International Journal of Remote Sensing 34 (15–16): 5892–5904. doi:10.1080/01431161.2013.798053.
Web of Science ®Google Scholar
White, J. C., P. Tompalski, M. Vastaranta, M. A. Wulder, N. Saarinen, C. Stepper, and N. C. Coops. 2017. A Model Development and Application Guide for Generating an Enhanced Forest Inventory Using Airborne Laser Scanning Data and an Area-Based Approach CWFC Information Report FI-X-018. Victoria, BC, Canada: Natural Resources Canada, Canadian Forest Service, Pacific Forestry Centre. doi:10.13140/rg.2.2.26770.96964.
Google Scholar
Woods, M., K. Lim, and P. Treitz. 2008. “Predicting Forest Stand Variables from LiDar Data in the Great Lakes – St. Lawrence Forest of Ontario.” Forestry Chronicle 84 (6): 827–839. doi:10.5558/tfc84827-6.
Web of Science ®Google Scholar
Xu, G., B. Manley, and J. Morgenroth. 2018. “Evaluation of Modelling Approaches in Predicting Forest Volume and Stand Age for Small-Scale Plantation Forests in New Zealand with RapidEye and LiDar.” International Journal of Applied Earth Observation and Geoinformation 73: 386–396. doi:10.1016/j.jag.2018.06.021.
Web of Science ®Google Scholar
Yu, Z., C. -G. Li, K. Su, X. -B. Zhou, and Z. -C. Chen. 2022. “Estimation of Forest Inventory Attributes Based on Stratification of Vertical Forest Structure Using Airborne Laser Scanning Data.” Journal of Guilin University of Technology (in Chinese). https://kns.cnki.net/kcms/detail/45.1375.n.20220429.1213.002.html.
Google Scholar
Zaffalon, M. 2005. “Credible Classification for Environmental Problems.” Environmental Modelling and Software 20 (8): 1003–1012. doi:10.1016/j.envsoft.2004.10.006.
Web of Science ®Google Scholar
Zhang, Z., L. Cao, and G. She. 2017. “Estimating Forest Structural Parameters Using Canopy Metrics Derived from Airborne LiDar Data in Subtropical Forests.” Remote Sensing 9: 940. doi:10.3390/rs9090940.
Web of Science ®Google Scholar
Zhao, K., J. C. Suarez, M. Garcia, T. Hu, C. Wang, and A. Londo. 2017. “Utility of Multitemporal Lidar for Forest and Carbon Monitoring: Tree Growth, Biomass Dynamics, and Carbon Flux.” Remote Sensing of Environment 204: 883–897. doi:10.1016/j.rse.2017.09.007.
Web of Science ®Google Scholar
Zhou, X., C. Li, Z. Yu, Z. Chen, and K. Su. 2022. “Classification of Vertical Forest Structure of Overstory in Subtropical Forest Using Airborne Lidar Data.” Forest Resources Management 1: 106–113. in Chinese. doi:10.13466/j.cnki.lyzygl.2022.01.013.
Google Scholar
Zolkos, S. G., S. J. Goetz, and R. Dubayah. 2013. “A Meta-Analysis of Terrestrial Aboveground Biomass Estimation Using Lidar Remote Sensing.” Remote Sensing of Environment 128: 128 289–298. doi:10.1016/j.rse.2012.10.017.
Web of Science ®Google Scholar

Appendix

Table A1. 86 formulations constructed with a rule-based exhaustive combination of 13 LiDAR variables.

Download CSV Display Table

Generalized models for subtropical forest inventory attribute estimations using a rule-based exhaustive combination approach with airborne LiDAR-derived metrics

ABSTRACT

1. Introduction

2. Materials and Methods

2.1. Study area

2.2. Data collection

2.2.1. Field plot data

Table 1. Summary statistics of the field plot data. CV is the coefficient of variation.

2.2.2. LiDAR data

Table 2. Summary of LiDAR survey flight and sensor parameters.

2.3. Methods for developing the generalized model

2.3.1. Model construction

Table 3. List of LiDAR-derived metrics used for establishing the prediction model.

2.3.2. Determining the best local model

2.3.3. Determining the best region-generalized formulation and model

Table 4. The accuracy criteria for the generalizing formulation and model based on the difference in rRMSE and R2 between the candidates for the best region- or type-generalized formulations or models and the best local model.

2.3.4. Determining the best type-generalized formulation and model

3. Results

3.1. The best local model for stand attribute prediction

Table 5. Model validation statistics for the best-performing models for the VOL and BA predictions in the three regions of the four forest types.

Table 6. The best local models for VOL estimation of four forest types in three regions and the best region-generalized model for VOL estimation of four forest types and their goodness-of-fit and validation statistics.

Table 7. The best local models of BA estimation of four forest types in three study sites and the best region-generalized model of BA estimation of four forest types and their goodness-of-fit and validation statistics.

3.2. The best region-generalized formulation and model

Table 8. The number of the candidates for the best region-generalized formulations for the VOL and BA estimations of four forest types selected by the mean method and mixed data method.

Table 9. The difference in rRMSE (δrrmse) and R2 (ΔR2) between the sixteen candidates for the best region-generalized formulations for the VOL and BA estimations obtained by two selection methods and the best local models for four forest types in three study sites.

Table 10. The rRMSE and R2 of the eight pending best-region-generalized models in three regions of four forest types.

Table 11. The differences in rRMSE (δrrmse) and R2 (ΔR2) between the eight pending best region-generalized models and the best local models for the stand volume and basal area estimations of four forest types in three regions.

Table 12. Standardized coefficients of the linear regression (logarithmic conversion) of the best region-generalized model of VOL and BA estimations of four forest types.

3.3. The performances of the best type-generalized formulations

Table 13. The differences in rRMSE (δrrmse) and R2 (ΔR2) between the eight candidates for the best type-generalized formulation of the VOL and BA estimations selected by two selection methods and the best local models in three regions of four forest types.

4. Discussion

4.1. Accuracy criteria for determining the generalized formulations and models

4.2. Accuracy of generalized model

4.3. Interpretability of the model variables

5. Conclusion

CRediT authorship contribution statement

Acknowledgement

Disclosure statement

Data availability statement

Additional information

Funding

References

Appendix

Table A1. 86 formulations constructed with a rule-based exhaustive combination of 13 LiDAR variables.

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 4. The accuracy criteria for the generalizing formulation and model based on the difference in rRMSE and R² between the candidates for the best region- or type-generalized formulations or models and the best local model.

Table 9. The difference in rRMSE (δrrmse) and R² (ΔR²) between the sixteen candidates for the best region-generalized formulations for the VOL and BA estimations obtained by two selection methods and the best local models for four forest types in three study sites.

Table 10. The rRMSE and R² of the eight pending best-region-generalized models in three regions of four forest types.

Table 11. The differences in rRMSE (δrrmse) and R² (ΔR²) between the eight pending best region-generalized models and the best local models for the stand volume and basal area estimations of four forest types in three regions.

Table 13. The differences in rRMSE (δrrmse) and R² (ΔR²) between the eight candidates for the best type-generalized formulation of the VOL and BA estimations selected by two selection methods and the best local models in three regions of four forest types.