Full article: Shading aware DSM generation from high resolution multi-view satellite images

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In many cases, the Digital Surface Models (DSMs) and Digital Elevation Models (DEMs) are obtained with Light Detection and Ranging (LiDAR) or stereo matching. As an active method, LiDAR is very accurate but expensive, thus often limiting its use in small-scale acquisition. Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors. However, underperformance of stereo matching easily occurs in textureless areas. Accordingly, this study proposed a Shading Aware DSM GEneration Method (SADGE) with high resolution multi-view satellite images. Considering the complementarity of stereo matching and Shape from Shading (SfS), SADGE combines the advantage of stereo matching and SfS technique. First, an improved Semi-Global Matching (SGM) technique is used to generate an initial surface expressed by a DSM; then, it is refined by optimizing the objective function which modeled the imaging process with the illumination, surface albedo, and normal object surface. Different from the existing shading-based DEM refinement or generation method, no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized. Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM (12.5–56.3% improvement) with sound overall shape and temporarily detailed surface compared with a software solution (SURE) for multi-view stereo.

KEYWORDS:

1. Introduction

The Digital Surface Models (DSMs) record the elevation values of the regularly spaced object surface. They are the base data for many applications, such as hydrologic modeling, urban planning, route selection, and earth volume calculation, to name a few. DSMs with higher spatial resolution and accuracy are needed considering the applications above. Generally, DSM can be generated with the Light Detection and Ranging (LiDAR) technique or photogrammetry method using multi-view optical images (Gabet, Giraudon, and Renouard Citation1997; Toutin Citation2004). The latter is becoming popular in large-scale DSM generation because of its low cost, the increase in high-resolution satellite images, and their extensive applications. Typically, the photogrammetry method generates the elevation values through image matching which finds the corresponding pixels between images according to the similarities of image texture (Shan et al. Citation2020). When the image texture is rich, image matching performs well. However, image matching may underperform in textureless areas. Consequently, the detailed shape in the textureless area cannot be generally reconstructed effectively with the image matching technique (Furukawa and Hernández Citation2015).

By contrast, Shape from Shading (SfS) (Frankot and Chellappa Citation1988; Horn Citation1970; Horn and Brooks Citation1986; Lions, Rouy, and Tourin Citation1993) can recover the detailed shape with one or more images even in a textureless area. Thus, it is complementary to the image matching technique. Some studies focus on the refinement of existing low-resolution DEM/DSM using SfS technique with single or multiple images (Heipke, Piechullek, and Ebner Citation2001; Lohse, Heipke, and Kirk Citation2006; Peng, Zhang, and Shan Citation2015). However, the concave/convex ambiguities can be obvious when using SfS with a single image which may cause wrong shape recovery, and no shading-based DSM generation method exists directly from multi-view satellite images. In this paper, we proposed a Shading Aware DSM GEneration Method (SADGE) for high-resolution multi-view satellite images. First, an improved Semi-Global Matching (SGM) (Hirschmüller Citation2007) technique is used to generate an initial DSM. Then, the imaging process is modeled as an interaction of the illumination, surface albedo, and surface normal while introducing an objective function that considers the multi-view images and all the parameters. At last, the final DSM is generated by optimizing the objective function. The experiments with ZiYuan-3 (Tang et al. Citation2013; Li, Wang, and Jiang Citation2021) and GaoFen-7 (Xie et al. Citation2020) stereo images show that our method can generate high accuracy DSM with fine details.

The paper is organized as follows: Section 2 briefly introduces the related work. Section 3 describes the proposed method in detail, while section 4 presents the quantitative and qualitative experiments. Section 5 concludes our work.

2. Related work

Image matching means determining the corresponding pixels of the same point in two or more images. Usually, image similarity measurements, such as the normalized correlation coefficients, Mutual Information (MI), Census, among others, are used as matching costs to judge photo consistency. According to the output scene representations, image matching methods can be roughly divided into four types (Furukawa and Ponce Citation2010): depth map based (Goesele, Curless, and Seitz Citation2006; Kolmogorov and Zabih Citation2001; Kwon, Tai, and Lin Citation2015), voxel based (Faugeras and Keriven Citation1998; Heise et al. Citation2015), patch based (Gallup, Frahm, and Pollefeys Citation2010; Habbecke and Kobbelt Citation2006; Hou et al. Citation2018), and mesh based (Esteban and Schmitt Citation2003; Furukawa et al. Citation2009). It can reconstruct highly detailed surface in a textured area (Hu et al. Citation2021; Liu and Ji Citation2020; Ma et al. Citation2021). However, it often performs poorly in a textureless area where accurately measuring photo consistency is difficult.

Different from image matching technique, SfS technique extracts the surface normal to recover the detailed surface by modeling the imaging process with the illumination, surface albedo, and surface normal. In the early days, SfS technique is mostly developed in the reconstruction of synthetic objects and planetary surfaces. Over few decades’ improvement, SfS technique has been used in many other scenes (e.g. human faces (Kemelmacher-Shlizerman and Basri Citation2010), close-range objects (Panagopoulos, Hadap, and Samaras Citation2012) and Earth surface (Peng, Zhang, and Shan Citation2015)). Most SfS-related algorithms are based on one single image. However, SfS is an ill-posed problem given that the illumination, surface albedo, and surface normal are all unknown while the image is the only observation, especially when only one image is available. Therefore, assumptions about the illumination and surface albedo are often imposed to make the problem solvable. For example, Ikeuchi and Horn (Citation1981) assumed uniform surface albedo and known illumination. Introduction of the spherical harmonics (Basri and Jacobs Citation2003) also eases the limits of illumination, but it assumes far and point light source. Another method called Photometric Stereo (PS) (Woodham Citation1980) aims to increase the number of illuminations, i.e. capturing images from a fixed position under different illuminations. It can produce compelling results, but capturing images is difficult from a fixed position with different illuminations in natural scenes which prevents its application range.

As mentioned above, image matching technique can effectively reconstruct the overall shape in a well-textured area, while SfS technique can recover the detailed shape even in a textureless area. Thus, it is of great interest to combine these two techniques and generate detailed surface in high accuracy. One method (Liu, Wu, and Wöhler Citation2018; Kim, Torii, and Okutomi Citation2016; Peng, Zhang, and Shan Citation2015; Wu et al. Citation2018, Citation2011; Xu et al. Citation2018) used the SfS technique to refine the existing surface derived from global open terrain data, such as the Shuttle Radar Topography Mission (SRTM) and the ALOS Global Digital Surface Model “ALOS World 3D − 30 m” (AW3D30) or generated through image matching or other methods, such as InSAR. The other method (Langguth et al. Citation2016; Maurer et al. Citation2018; Quéau et al. Citation2017) attempted to fuse image matching and SfS into one model and then solved it together with close-range images. The former actually adopted a two-stage strategy, first initial surface reconstruction with image matching, InSAR or LiDAR, and then SfS-based surface refinement. The latter combined image matching and SfS into one stage. However, most of the algorithms considered the planetary geometry (Lohse, Heipke, and Kirk Citation2006) or the close-range surface, such as faces and statues (Langguth et al. Citation2016). For DEM on earth, the work most related to our method is that of Peng, Zhang, and Shan (Citation2015) who refined the SRTM GL3 of 90 m resolution to 30 m with one single Landsat image and subsequently achieved fine results, but the concave/convex ambiguities persisted which might cause wrong surface detail recovery. In addition, the illumination direction is assumed to be known, and uniform surface albedo is assumed in a local area in their algorithm.

In this paper, we proposed a method that aims to accurately generate a detailed DEM with high resolution from a high-resolution multi-view satellite stereo image. No need arises for illumination direction in our algorithm because spherical harmonics is used to represent the illuminations, and the surface albedo can be different in a different point.

3. Proposed method

This section rigorously introduces the proposed method. As shown in , an initial surface is firstly reconstructed with the multi-measurement Semi-Global Matching (mSGM). Then, an objective function considering the illumination, surface albedo, and surface normal is built. Finally, the initial surface reconstructed earlier is refined by optimizing the objective function. Different from the existing DEM refinement algorithms, the refinement procedure in the proposed method used multiple images; moreover, eliminating the convex/concave ambiguity is helpful (Quéau et al. Citation2017).

Figure 1. An overview of the proposed SADGE: (a) the input multiple satellite images, (b) the initial DSM reconstructed with mSGM, (c) the final DSM after shading based refinement.

3.1 Image matching: mSGM

For the image matching technique, we followed Tao’s work (Tao Citation2016) and applied mSGM to obtain the initial DEM with high-resolution satellite images. Similar to the traditional SGM, mSGM applied pyramid matching in different image layers. However, mSGM combined two kinds of matching cost, namely, MI and Census, to take advantage of them. Census, as a window-based matching cost, is very robust. It can measure the similarities of two images without needing a prior probability lookup table required by MI. MI can calculate the similarities of pixel pairs instead of two image windows. Thus, it has a huge advantage in capturing the structure details of objects. In the top layer, the matching measure of SGM has been changed from MI to Census to increase the robustness of SGM. Except for the top layer, MI is still used as matching measures in the other layers. In addition, mSGM adopts the coarse-to-fine strategy, and matching results are passed from top to bottom. The matched disparity map of the current image layer is used to interpolate the initial disparity map of the next image layer, and the range of the disparity candidates for each pixel is estimated. The matching efficiency is largely improved, and the possibility of mismatch is reduced.

3.2 Refinement of DSM

As for the refinement of DEM, considering the reflection model as the first step is necessary because the SfS technique is essentially an inverse of imaging process. The Lambertian reflection model that reflects the same in all directions is used similar to many other DEM generations or refinement algorithms. However, different from those algorithms on Earth or other planets, the spherical harmonics is utilized instead of the classic “direction model” that needs the incident angle from the light source. The spherical harmonics is first introduced by Basri and Jacobs (Basri and Jacobs Citation2001) and is widely used in close-range surface refinement. They proved that the images of the convex Lambertian object can be accurately approximated with a low-dimensional linear subspace. However, a prerequisite is set, i.e. the light source should be distant which means the complex illumination or close light source might influence the stability of the spherical harmonics. Fortunately, the light source of the surface on Earth in satellite images is mainly from the sun and the sky, both of which are sufficiently far. Therefore, second-order spherical harmonics basis functions are temporarily used to represent the illumination in the proposed method considering the complexity. Based on the spherical harmonics mentioned above, the imaging process can be modeled as

(1)

I_{i, j}^{m} = ρ_{i, j} \cdot [\begin{matrix} L_{0}^{m} & \dots & L_{8}^{m} \end{matrix}] \cdot {[\begin{matrix} 1 & n_{x} & n_{y} & n_{z} & n_{x} n_{y} & n_{x} n_{z} & n_{y} n_{z} & n_{x}^{2} - n_{y}^{2} & 3 n_{z}^{2} - 1 \end{matrix}]}^{T}

(1)

where $I_{i, j}^{m}$ is the corresponding pixel value in the $m$ th image, $ρ_{i, j}$ is the albedo, $(n_{x}, n_{y}, n_{z})$ is the unit normal vector of the surface point $(i, j),$ and $L_{0}^{m}$ to $L_{8}^{m}$ are the coefficients of the spherical harmonics basis of the $m$ th image. For the same surface point $(i, j)$ , the surface albedo $ρ_{i, j}$ and normal $(n_{x}, n_{y}, n_{z})$ are the same for different images, whereas the pixel value $I_{i, j}^{m}$ and spherical harmonics basis $L_{0}^{m}$ to $L_{8}^{m}$ are different in different images. In addition, different surface points can have different surface albedo which is closer to the real situation.

Considering the imaging process above, an objective function is built:

(2)

min_{d_{z}, ρ, L} E_{data} (d_{z}, ρ, L) + α E_{geometry} (d_{z}) + β E_{albedo} (ρ)

(2)

where $d_{z}$ is the change in the Z-axis coordinates of the surface point, $ρ$ is the albedo of the surface point, and $L$ is the coefficient of the spherical harmonics basis. $α$ and $β$ are the weights to balance the data term $E_{data}$ , geometry smooth term $E_{geometry}$ , and reflectance smooth term $E_{albedo}$ . Next, the three terms will be introduced in detail.

The data term $E_{data}$ is used to measure the difference between the observation, i.e. the image value and the estimator, i.e. the inverse rendered pixel value.

(3)

E_{data} (d_{z}, ρ, L) = \sum_{m = 0}^{M} \sum_{(i, j) \in S} \frac{I_{i, j}^{m} (Z + d_{z}) - R_{i, j}^{m} (Z + d_{z}, ρ, L)}{M}

(3)

where $M$ is the number of the visible satellite images, $S$ is the initial surface, i.e. the DEM reconstructed by SGM, $(i, j)$ denotes the horizontal coordinates of the surface point, $Z$ is the vertical coordinate of the surface point, $I_{i, j}^{m}$ is the corresponding image pixel value of the surface point $(i, j)$ , and $R_{i, j}^{m}$ is the inverse rendered pixel value of the surface point $(i, j)$ . Notably, the satellite images used in this paper are captured from distant places, and the view angles are small; thus, the surface point will be considered visible as long as it is in the range of the image. In addition, $I_{i, j}^{m}$ will be re-computed with the change in $Z + d_{z}$ as well as $R_{i, j}^{m}$ which is defined in EquationEquation (1)(1) $I_{i, j}^{m} = ρ_{i, j} \cdot [\begin{matrix} L_{0}^{m} & \dots & L_{8}^{m} \end{matrix}] \cdot {[\begin{matrix} 1 & n_{x} & n_{y} & n_{z} & n_{x} n_{y} & n_{x} n_{z} & n_{y} n_{z} & n_{x}^{2} - n_{y}^{2} & 3 n_{z}^{2} - 1 \end{matrix}]}^{T}$ (1) with the illumination, surface albedo, and surface normal. The data term $E_{data}$ is set following the principle that the inverse rendered pixel value should be equal to the image pixel value.

To keep the surface smooth and eliminate the obvious error from the SGM, the geometry smooth term $E_{geometry}$ is set.

(4)

E_{geometry} (d_{z}) = \sum \underset{(i, j) \in S}{} (\sum \underset{(i^{'}, j^{'}) \in A_{ij}}{} w_{i^{'} j^{'}} \frac{∥{(Z + d_{z})}_{ij} - {(Z + d_{z})}_{i^{'} j^{'}}∥}{gsd \cdot N_{A_{ij}}} + γ ∥{(Z + d_{z})}_{ij} - G_{ij}∥)

(4)

where $A_{ij}$ is the neighbor of the surface point $(i, j)$ ; $(i^{'}, j^{'})$ is the neighbor point belonging to $A_{ij}$ ; $w_{i^{'} j^{'}}$ is the weight of the neighbor point $(i^{'}, j^{'})$ ; $gsd$ is the resolution of the DEM; $N_{A_{ij}}$ is the number of the neighbor points in $A_{ij}$ which is 8, considering that a 3 × 3 window is chosen as the neighbor in our experiment; ${(Z + d_{z})}_{ij}$ and ${(Z + d_{z})}_{ij}$ are the Z-axis coordinate of the surface point $(i, j)$ and its neighbor point $(i^{'}, j^{'}),$ respectively; $G_{ij}$ is the filtered vertical coordinate of the surface point $(i, j)$ with guided filter; and $γ$ is the weight. The first term is set to smooth the surface while keeping the sharp edge, and the second term is set to eliminate the possible obvious error while considering the image edge similar to the first term. $w_{i^{'} j^{'}}$ and $G_{ij}$ are calculated using a guided filter with the image as guide to filter the DEM. As mentioned above, a 3 × 3 window is used as neighbor for the surface point. However, the window size is not big enough when the obvious error is not a point but within an area, and the increase in the window size means memory and complexity increase. Thus, the second term is set and calculated with large window size, i.e. 10 × 10 in our experiment. Accordingly, the second term in the geometry smooth term can eliminate the obvious error in the initial DEM, whereas the first term can keep the sharp edges in the image. The calculation of $w_{i^{'} j^{'}}$ is shown below:

(5)

w_{i^{'} j^{'}} = \frac{1}{{N_{A_{ij}}}^{2}} \sum_{(i^{'}, j^{'}), (i, j) \in A_{ij}} [1 + \frac{(I_{ij} - μ) (I_{i^{'} j^{'}} - μ)}{σ^{2} + ε}]

(5)

where $ε$ is a regularization parameter, $μ$ and $σ^{2}$ are the mean and variance of the image value in $A_{ij}$ , respectively.

As for the surface albedo $ρ$ in EquationEquation (2)(2) $min_{d_{z}, ρ, L} E_{data} (d_{z}, ρ, L) + α E_{geometry} (d_{z}) + β E_{albedo} (ρ)$ (2) , it is allowed to be different for a different surface point. However, $ρ$ is multiplied with the spherical harmonics as shown in EquationEquation (1)(1) $I_{i, j}^{m} = ρ_{i, j} \cdot [\begin{matrix} L_{0}^{m} & \dots & L_{8}^{m} \end{matrix}] \cdot {[\begin{matrix} 1 & n_{x} & n_{y} & n_{z} & n_{x} n_{y} & n_{x} n_{z} & n_{y} n_{z} & n_{x}^{2} - n_{y}^{2} & 3 n_{z}^{2} - 1 \end{matrix}]}^{T}$ (1) . To better separate the surface albedo $ρ$ with the spherical harmonics, an albedo smooth term is set:

(6)

E_{albedo} (ρ) = \sum_{(i, j) \in S} \sum_{(i^{'}, j^{'}) \in A_{ij}} e^{- k {(I_{i j}^{0} - I_{i^{'} j^{'}}^{0})}^{2}} \frac{∥ρ_{ij} - ρ_{i^{'} j^{'}}∥}{N_{A_{ij}}}

(6)

where $k$ is a constant value, $I_{i j}^{0}$ is the mean pixel value of all the visible images for surface point $(i, j)$ , $I_{i^{'} j^{'}}^{0}$ is the mean pixel value of all the visible images for surface point $(i^{'}, j^{'})$ , $ρ_{ij}$ and $ρ_{i^{'} j^{'}}$ are the albedo of the surface points $(i, j)$ and $(i^{'}, j^{'})$ , respectively.

After the objective function is built, the Levenberg-Marquardt implemented in the Ceres Solver (Agarwal and Mierle Citation2018) is used to optimize it, i.e. the best coefficients of the spherical harmonic basis, surface albedo, and normal are determined under the constraints of geometry and albedo smoothness.

4. Experiment and discussion

Two sets of experiments are designed to verify the effectiveness of the proposed method. The first experiment is conducted with the ZiYuan-3 stereo images in a desert area. However, no ground control points or high accuracy DEM exists for quantitative evaluation. Accordingly, a qualitative evaluation is made. As shown in , image matching method like mSGM can effectively reconstruct the overall shape. However, when examining the details in the DSM generated by mSGM, it is noisy with several obvious errors. On the contrast, the proposed method which combines image matching and SfS technique can effectively reconstruct the overall shape while recovering the detailed shape and eliminating some obvious errors.

The second experiment is conducted with the GaoFen-7 and the ZiYuan-3 stereo images. For the experiment area of the GaoFen-7 images, the DSM of 2.5 m resolution obtained with high accuracy LiDAR is used as the reference. For the ZiYuan-3 images, the DSM in Sainte-Maxime of France based on French DSM from very high resolution aerial images is used to evaluate the DSM generated with the proposed method quantitatively. Its spatial resolution has been reduced from 0.4 to 10 m. Therefore, the surface details might be missing, but the accuracy should remain sufficiently high to serve as reference. As baseline, we choose a software solution for multi-view stereo SURE (Rothermel and Wenzel Citation2012). The libtsgm with the core functionality for dense matching provided by Rothermel and Wenzel (Citation2012) is used to match the disparity map, while the generations of point clouds from disparity maps and the generation of grid DSM are the same as the proposed method.

For the GaoFen-7 and ZiYuan-3 stereo images, mSGM is first used to generate the initial DSM. Then, the initial DSM is refined by optimizing the objective function mentioned above, and the final spatial resolutions of the reconstructed DSM are 1 and 5 m, respectively. The proposed method is implemented using C++ with external dependencies: Ceres Solver (Agarwal and Mierle Citation2018) and OpenCV (Bradski Citation2000). We experiment on a standard Windows 10 computer with Intel Xeon CPU and 64 GB of memory without GPU optimization. Throughout the experiments, we use the same values for all optimization parameters in SADGE: $α$ = 0.1, $β$ = 0.3, and $k$ = 10 with the basic rule that no substantial difference should exist among the terms in the objective function.

To quantitatively evaluate the proposed method, three areas with different land covers are chosen as the experiment area. The three land covers are field, urban, and hill. In addition, SURE (Rothermel and Wenzel Citation2012) is used as a baseline method to verify the accuracy of the proposed SADGE.

For the GaoFen-7 images, a 300 × 300 m field area is chosen for quantitative evaluation. As shown in , the DSM reconstructed by the proposed SADGE is more similar to the reference DSM than the DSM reconstructed by SURE in the field area. Apparently, many noises are found in the DSM reconstructed by SURE, whereas the DSM generated by the proposed SADGE is considerably more flat. As for the detailed shape, the proposed SADGE can successfully recover the detailed shape such as the road in the field area. This result demonstrates that the proposed method not only can recover fine details but also reduce noises in the field area.

Figure 2. Results of the field area with the GaoFen-7 images. (a) The reference image, (b) the ground truth obtained with LiDAR, (c) the DSM generated with SURE, (d) the DSM generated with the proposed SADGE.

For the ZiYuan-3 images, two 2.5 × 2.5 km areas with urban and hill land covers are chosen for quantitative evaluation. As shown in , the DSM reconstructed by the proposed SADGE is “sharper” than that reconstructed by SURE in the urban area. Compared with SURE, the proposed SADGE evidently reconstructs more detailed shapes, especially in the edge areas of the buildings. In addition, the DSM by the proposed SADGE is smoother than SURE in the flat area shown in the second and third columns in .

Figure 3. Results in Sainte-Maxime of France of the urban area. From top to bottom are the nadir ZiYuan-3 images (a), the rendered reference DSM based on French DSM (b), the rendered DSM generated by the proposed method (c), and the rendered DSM generated by SURE (d). The second and third columns are for the up and down small squares in the scene, respectively.

As shown in , the DSM reconstructed by the proposed SADGE is “clearer” than that reconstructed by SURE in the hill area. Compared with SURE, the proposed SADGE obviously reconstructs the shapes well and is more similar to the reference, especially in the second and third columns. As mentioned above, the reference DSM has been reduced from 0.4 to 10 m in the spatial resolution. The lack of details is anticipated, whereas the proposed SADGE can reconstruct more detailed shapes consistent to the nadir image.

Figure 4. Results in Sainte-Maxime of France of the hill area. From top to bottom are the nadir ZiYuan-3 images (a), the rendered reference DSM based on French DSM (b), the rendered DSM generated by the proposed method (c), and the rendered DSM generated by SURE (d). The second and third columns are for the left and right small squares in the scene, respectively.

further shows the accumulated distribution of DSM errors of the DSM reconstructed by SADGE and SURE. Apparently, SADGE has more points with smaller DSM error than SURE in all the experiment areas.

Figure 5. Accumulative distribution of the DSM errors of the DSM generated by SADGE and SURE in the field (a), urban (b), and hill areas (c).

To evaluate the accuracies of the DSMs reconstructed by the proposed SADGE and SURE, calculates and lists the mean, standard deviation and root mean square of the DSM errors in the field, urban and hill experiment areas. As shown in , the proposed SADGE achieves the highest accuracy in all the evaluated metrics and has an improvement of about 56.3% in the field area, 15.7% in urban area, and 12.5% in hill area compared with SURE in terms of Root Mean Square Error (RMSE).

Table 1. Quantitative evaluation of SADGE and SURE.

Download CSV Display Table

5. Conclusions

The current study proposes a novel Shading Aware DSM GEneration (SADGE) method based on high-resolution satellite images. The DSM is reconstructed with a combination of image matching and SfS, i.e. the initial surface is reconstructed with image matching, and the final DSM is obtained through the refinement using SfS. For image matching, an improved semi-global matching method is used. As for DSM refinement, it is achieved through solving an object function considering the illumination, surface albedo and surface normal.

Two kinds of satellite images are utilized to evaluate the proposed SADGE: ZiYuan-3 and GaoFen-7 images. To better evaluate the reconstruction accuracy, three areas with different types of landcover are used: field, urban and hill. All the experiments demonstrate that this method can recover more detailed shapes and improve the accuracy of the DSM up to 56.3% in the field area, 15.7% in the urban area, and 12.5% in hill area than a software solution for multi-view stereo (SURE).

Owing to the compelling performance of combining image matching and shading in DSM reconstruction, this study recommends considering shading as an essential part of DSM generation, such as SADGE. However, the assumptions of SADGE are still not ideal, e.g. the Lambertian reflectance model and the lack of handling shadows. Thus, exploring more accurate reflectance models and better handling shadows can be interesting future work.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The Sainte-Maxime dataset used in the study is available from https://www.isprs.org/data/zy-3/Default-HongKong-StMaxime.aspx, other datasets are available from China Center For Resources Satellite Data and Application (CRESDA), http://www.cresda.com/CN/index.shtml, but restrictions apply to the availability of these data, which were used under license for this study.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 41801390] and the National Key R&D Program of China [grant number 2018YFD1100405].

Notes on contributors

Zhihua Hu

Zhihua Hu received the PhD degree in photogrammetry and remote sensing from Wuhan University in 2020. His current research interests include mesh refinement, and multi-view images 3D reconstruction.

Pengjie Tao

Pengjie Tao is currently an associate research fellow. His research interests include photogrammetry, registration of optical images and LiDAR points, and multi-view images 3D reconstruction.

Xiaoxiang Long

Xiaoxiang Long is currently a research fellow. His research interests include satellite photogrammetry, and multi-view images 3D reconstruction.

Haiyan Wang

Haiyan Wang is currently an engineer in the field of satellite mapping. His research interests include satellite photogrammetry, and multi-view images 3D reconstruction.

References

Agarwal, S., K. Mierle, and The Ceres Solver Team. 2018. “Ceres Solver.” http://ceres-solver.org
Google Scholar
Basri, R., and D. W. Jacobs. 2003. “Lambertian Reflectance and Linear Subspaces.” IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2): 218–233. doi:10.1109/TPAMI.2003.1177153.
Web of Science ®Google Scholar
Bradski, G. 2000. “The OpenCv Library.” Dr Dobb’s Journal: Software Tools for the Professional Programmer. doi:10.1111/0023-8333.50.s1.10.
Google Scholar
Esteban, C. H., and F. Schmitt. 2003. “Silhouette and Stereo Fusion for 3D Object Modeling.” Fourth International Conference on 3-D Digital Imaging and Modeling 3DIM 46–53. doi:10.1109/IM.2003.1240231.
Google Scholar
Faugeras, O., and R. Keriven. 1998. “Variational Principles, Surface Evolution, Pdes, Level Set Methods, and the Stereo Problem.” IEEE Transactions on Image Processing 7 (3): 336–344. doi:10.1109/83.661183.
PubMed Web of Science ®Google Scholar
Frankot, R. T., and R. Chellappa. 1988. “A Method for Enforcing Integrability in Shape from Shading Algorithms.” IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (4): 439–451. doi:10.1109/34.3909.
Web of Science ®Google Scholar
Furukawa, Y., B. Curless, S. M. Seitz, and R. Szeliski. 2009. “Manhattan-World Stereo.” In IEEE Conference on Computer Vision and Pattern Recognition, 1422–1429. doi: 10.1109/CVPR.2009.5206867.
Google Scholar
Furukawa, Y., and C. Hernández. 2015. ”Multi-View Stereo : A Tutorial.” 9: 1–148. doi:10.1007/978-3-319-05558-9.
Google Scholar
Furukawa, Y., and J. Ponce. 2010. “Accurate, Dense, and Robust Multiview Stereopsis.” IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (8): 1362–1376. doi:10.1109/TPAMI.2009.161.
PubMed Web of Science ®Google Scholar
Gabet, L., G. Giraudon, and L. Renouard. 1997. “Automatic Generation of High Resolution Urban Zone Digital Elevation Models.” ISPRS Journal of Photogrammetry and Remote Sensing 52: 33–47. doi:10.1016/S0924-2716(96)00030-5.
Web of Science ®Google Scholar
Gallup, D., J. M. Frahm, and M. Pollefeys. 2010. “Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction.” In IEEE Conference on Computer Vision and Pattern Recognition, 1418–1425. doi:10.1109/CVPR.2010.5539804.
Google Scholar
Goesele, M., B. Curless, and S. M. Seitz. 2006. “Multi-View Stereo Revisited.” In IEEE Conference on Computer Vision and Pattern Recognition, 2402–2409. doi: 10.1109/CVPR.2006.199.
Google Scholar
Habbecke, M., and L. Kobbelt. 2006. “Iterative Multi-View Plane Fitting.” In International Fall Workshop of Vision, Modeling,and Visualization, 73–80.
Google Scholar
Heipke, C., C. Piechullek, and H. Ebner. 2001. “Simulation Studies and Practical Tests Using Multi-Image Shape from Shading.” ISPRS Journal of Photogrammetry and Remote Sensing 56: 139–148. doi:10.1016/S0924-2716(01)00038-7.
Web of Science ®Google Scholar
Heise, P., B. Jensen, S. Klose, and A. Knoll. 2015. “Variational Patchmatch Multiview Reconstruction and Refinement.” In IEEE International Conference on Computer Vision (ICCV), 882–890. doi:10.1109/ICCV.2015.107.
Google Scholar
Hirschmüller, H. 2007. “Stereo Processing by Semiglobal Matching and Mutual Information.” IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2): 328–341. doi:10.1109/TPAMI.2007.1166.
Web of Science ®Google Scholar
Horn, B. K. P. 1970. Shape from Shading: A Method for Obtaining the Shape of a Smooth Opaque Object from One View. Cambridge, MA: Massachusetts Institute of Technology.
Google Scholar
Horn, B. K. P., and M. J. Brooks. 1986. “The Variational Approach to Shape from Shading.” Computer Vision, Graphics, and Image Processing 33 (2): 174–208. doi:10.1016/0734-189X(86)90114-3.
Google Scholar
Hou, Y., J. Peng, Z. Hu, P. Tao, and J. Shan. 2018. “Planarity Constrained Multi-View Depth Map Reconstruction for Urban Scenes.” ISPRS Journal of Photogrammetry and Remote Sensing 139: 133–145. doi:10.1016/j.isprsjprs.2018.03.003.
Web of Science ®Google Scholar
Hu, Z., Y. Hou, P. Tao, and J. Shan. 2021. “IMGTR: Image-Triangle Based Multi-View 3D Reconstruction for Urban Scenes.” ISPRS Journal of Photogrammetry and Remote Sensing 181: 191–204. doi:10.1016/j.isprsjprs.2021.09.009.
Web of Science ®Google Scholar
Ikeuchi, K., and B. K. P. Horn. 1981. “Numerical Shape from Shading and Occluding Boundaries.” Artificial Intelligence 17 (1–3): 141–184. doi:10.1016/0004-3702(81)90023-0.
Web of Science ®Google Scholar
Kemelmacher-Shlizerman, I., and R. Basri. 2010. “3D Face Reconstruction from a Single Image Using a Single Reference Face Shape.” IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (2): 394–405. doi:10.1109/TPAMI.2010.63.
Web of Science ®Google Scholar
Kim, K., A. Torii, and M. Okutomi. 2016. “Multi-View Inverse Rendering Under Arbitrary Illumination and Albedo.” In European Conference on Computer Vision, 750–767. Cham: Springer. doi:10.1007/978-3-319-46487-9_46.
Google Scholar
Kolmogorov, V., and R. Zabih. 2001. “Computing Visual Correspondence with Occlusions Using Graph Cuts.” In IEEE/CVF International Conference on Computer Vision (ICCV), 508–515. doi: 10.1109/ICCV.2001.937668.
Google Scholar
Kwon, H., Y. W. Tai, and S. Lin. 2015. “Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation.” In IEEE Conference on Computer Vision and Pattern Recognition, 159–167. doi: 10.1109/CVPR.2015.7298611.
Google Scholar
Langguth, F., K. Sunkavalli, S. Hadap, and M. Goesele. 2016. “Shading-Aware Multi-View Stereo.” In European Conference on Computer Vision, 469–485. Cham: Springer. doi:10.1007/978-3-319-46487-9_29.
Google Scholar
Li, D., M. Wang, and J. Jiang. 2021. “China’s High-Resolution Optical Remote Sensing Satellites and Their Mapping Applications.” Geo-Spatial Information Science 24 (1): 85–94. doi:10.1080/10095020.2020.1838957.
Web of Science ®Google Scholar
Lions, P. L., E. Rouy, and A. Tourin. 1993. “Shape-From-Shading, Viscosity Solutions and Edges.” Numerische Mathematik 64: 323–353. doi:10.1007/BF01388692.
Web of Science ®Google Scholar
Liu, J., and S. Ji. 2020. “A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction from an Open Aerial Dataset.” In IEEE Conference on Computer Vision and Pattern Recognition, 6049–6058. doi:10.1109/CVPR42600.2020.00609.
Google Scholar
Liu, W. C., B. Wu, and C. Wöhler. 2018. “Effects of Illumination Differences on Photometric Stereo Shape-And-Albedo-From-Shading for Precision Lunar Surface Reconstruction.” ISPRS Journal of Photogrammetry and Remote Sensing 136: 58–72. doi:10.1016/j.isprsjprs.2017.12.010.
Web of Science ®Google Scholar
Lohse, V., C. Heipke, and R. L. Kirk. 2006. “Derivation of Planetary Topography Using Multi-Image Shape-From-Shading.” Planetary and Space Science 54 (7): 661–674. doi:10.1016/j.pss.2006.03.002.
Web of Science ®Google Scholar
Ma, X., Y. Gong, Q. Wang, J. Huang, L. Chen, and F. Yu. 2021. “EPP-Mvsnet: Epipolar-Assembling Based Depth Prediction for Multi-View Stereo.” In IEEE/CVF International Conference on Computer Vision (ICCV), 5712–5720. doi:10.1109/ICCV48922.2021.00568.
Google Scholar
Maurer, D., Y. C. Ju, M. Breuß, and A. Bruhn. 2018. “Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo.” International Journal of Computer Vision 126: 1342–1366. doi:10.1007/s11263-018-1079-1.
Web of Science ®Google Scholar
Panagopoulos, A., S. Hadap, and D. Samaras. 2012. “Reconstructing Shape from Dictionaries of Shading Primitives.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 80–94. Berlin: Springer. doi:10.1007/978-3-642-37447-0_7.
Google Scholar
Peng, J., Y. Zhang, and J. Shan. 2015. “Shading-Based DEM Refinement Under a Comprehensive Imaging Model.” ISPRS Journal of Photogrammetry and Remote Sensing 110: 24–33. doi:10.1016/j.isprsjprs.2015.09.012.
Web of Science ®Google Scholar
Quéau, Y., J. Mélou, J.-D. Durou, and D. Cremers. 2017. “Dense Multi-View 3D-Reconstruction Without Dense Correspondences.” ArXiv Abs/1704 00337. doi:10.1016/j.compgeo.2009.08.004.
Google Scholar
Rothermel, M., and K. Wenzel. 2012. “SURE - Photogrammetric Surface Reconstruction from Imagery.” In Proceedings LC3D Workshop, 1–9.
Google Scholar
Shan, J., Z. Hu, P. Tao, L. Wang, S. Zhang, and S. Ji. 2020. “Toward a Unified Theoretical Framework for Photogrammetry.” Geo-Spatial Information Science 23 (1): 75–86. doi:10.1080/10095020.2020.1730712.
Web of Science ®Google Scholar
Tang, X., G. Zhang, X. Zhu, H. Pan, Y. Jiang, P. Zhou, and X. Wang. 2013. “Triple Linear-Array Image Geometry Model of ZiYuan-3 Surveying Satellite and Its Validation.” International Journal of Image and Data Fusion 4 (1): 33–51. doi:10.1080/19479832.2012.734340.
Google Scholar
Tao, P. 2016. 3D Surface Reconstruction and Optimization Based on Geometric and Radiometric Integral Imaging Model. Wuhan: Wuhan University.
Google Scholar
Toutin, T. 2004. “Comparison of Stereo-Extracted DTM from Different High-Resolution Sensors: SPOT-5, EROS-A, IKONOS-II, and QuickBird.” IEEE Transactions on Geoscience and Remote Sensing 42 (10): 2121–2129. doi:10.1109/TGRS.2004.834641.
Web of Science ®Google Scholar
Woodham, R. J. 1980. “Photometric Method for Determining Surface Orientation from Multiple Images.” Optical Engineering 19 (1): 139–144. doi:10.1117/12.7972479.
Web of Science ®Google Scholar
Wu, B., W. C. Liu, A. Grumpe, and C. Wöhler. 2018. “Construction of Pixel-Level Resolution Dems from Monocular Images by Shape and Albedo from Shading Constrained with Low-Resolution DEM.” ISPRS Journal of Photogrammetry and Remote Sensing 140: 3–19. doi:10.1016/j.isprsjprs.2017.03.007.
Web of Science ®Google Scholar
Wu, C., B. Wilburn, Y. Matsushita, and C. Theobalt. 2011. “High-Quality Shape from Multi-View Stereo and Shading Under General Illumination.” In IEEE Conference on Computer Vision and Pattern Recognition, 969–976. doi: 10.1109/CVPR.2011.5995388.
Google Scholar
Xie, J., G. Huang, R. Liu, C. Zhao, J. Dai, T. Jin, F. Mo, et al. 2020. ”Design and Data Processing of China’s First Spaceborne Laser Altimeter System for Earth Observation: GaoFen-7”. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 1034–1044. doi:10.1109/JSTARS.2020.2977935.
Web of Science ®Google Scholar
Xu, D., Q. Duan, J. Zheng, J. Zhang, J. Cai, and T. J. Cham. 2018. “Shading-Based Surface Detail Recovery Under General Unknown Illumination.” IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (2): 423–436. doi:10.1109/TPAMI.2017.2671458.
PubMed Web of Science ®Google Scholar

Shading aware DSM generation from high resolution multi-view satellite images

ABSTRACT

1. Introduction

2. Related work