Full article: Virtual image-based cloud removal for Landsat images

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The inevitable thick cloud contamination in Landsat images has severely limited the usability and applications of these images. Developing cloud removal algorithms has been a hot research topic in recent years. Many previous algorithms used one or multiple cloud-free image(s) in the same area acquired on other date(s) as reference image(s) to reconstruct missing pixel values. However, it remains challenging to determine the optimal reference image(s). In addition, abrupt land cover change can substantially degrade the reconstruction accuracies. To address these issues, we present a new cloud removal algorithm called Virtual Image-based Cloud Removal (VICR). For each cloud region, VICR reconstructs the missing surface reflectance by three steps: virtual image within cloud region construction based on time-series reference images, similar pixel selection using the newly proposed temporally weighted spectral distance (TWSD), and residual image estimation. By establishing two buffer zones around the cloud region, VICR allows automatic selection of the optimal set of time-series reference images. The effectiveness of VICR was validated at four testing sites with different landscapes (i.e. urban, croplands, and wetlands) and land change patterns (i.e. phenological change, abrupt change caused by flooding and tidal inundation), and the performances were compared with mNSPI (modified neighborhood similar pixel interpolator), WLR (weighted linear regression) and ARRC (AutoRegression to Remove Clouds). Experimental results showed that VICR outperformed the other algorithms and achieved higher Correlation Coefficients and lower Root Mean Square Errors in surface reflectance estimation at the four sites. The improvement is particularly noticeable at the sites with abrupt land change. By considering the difference in the contributions from the reference images, TWSD can select more reliable similar pixels to improve the prediction of abrupt change in surface reflectance. Moreover, VICR is more robust to different cloud sizes and to changing reference images. VICR is also computationally much faster than ARRC. The framework for time-series image cloud removal by VICR has great potential to be applied for large datasets processing.

KEYWORDS:

1. Introduction

Remote sensing imagery acquired by optical satellite sensors are important earth observation data sources and play irreplaceable roles in the studies of global climate change and natural resources management (Kokhanovsky Citation2013; Nijland, Reshitnyk, and Rubidge Citation2019; Wang et al. Citation2021; Chen et al. Citation2022). Among the optical satellite sensors, the Landsat series, including Thematic Mapper (TM), Enhanced Thematic Mapper (ETM+), Operational Land Imager (OLI) are possibly one of the most widely used owing to their medium spatial resolution, regular revisit cycle, and long-term data continuity (Huang et al. Citation2021; Zhu et al. Citation2020). However, cloud contaminations in Landsat images are inevitable because approximately one-third of global land area is obscured by clouds. For some areas, the probability of cloud cover is even higher (Shen et al. Citation2015; Cao et al. Citation2020). Clouds and their shadows lead to significant reductions in the availability of valid observations and restrict real-time monitoring of land surface change (Wang et al. Citation2021; Li et al. Citation2021). Compared to thin clouds, thick clouds result in missing image values as they completely block the electromagnetic energy reflected from the land surface. Reconstruction of the images covered by thick clouds and their shadows (hereafter we call cloud removal) has become a hot research topic for the purpose of wider application of time-series Landsat images (Shen et al. Citation2015; B. Chen et al. Citation2017; Luo et al. Citation2020; Zhang et al. Citation2020; Li et al. Citation2016).

The basic idea of cloud removal approaches is to predict unknown pixels in cloud/shadow areas with the assist of known information. According to the sources of auxiliary information, cloud removal methods can be divided into three categories: (1) single image-based methods, (2) multi-sensor image-based methods, and (3) single-sensor reference image(s)-based methods. Single image-based methods restore cloud pixels based on the assumption that the missing data and the surrounding cloudless data share the same statistical distribution or geometric structures. Typical methods include spatial interpolation (Siravenha et al. Citation2011; Li et al. Citation2014), and deep learning algorithms such as Generative Adversarial Networks (Xu et al. Citation2022; Zheng, Liu, and Wang Citation2021). These methods do not require additional data sources, while produce unreliable results for large clouds. The second category of methods uses auxiliary information from different sensors such as synthetic aperture radar (SAR) images that are less affected by cloud, or MODIS images that have short revisit cycle (Shen et al. Citation2019; Gao et al. Citation2021; Meraner et al. Citation2020). Due to radiometric inconsistencies and spatial resolution discrepancies, these methods have limited performances in areas with complex spatial patterns (Shen et al. Citation2019; Li et al. Citation2019). Single-sensor reference image(s)-based methods utilize cloud-free images acquired by the same sensor on other date(s) as reference image(s). The spectral and spatial relationships between the neighboring pixels and the cloud pixels (also called target pixels) are quantified based on the reference image(s) and used to fill the missing values (Chen et al. Citation2011; Zhu et al. Citation2012; Zeng, Shen, and Zhang Citation2013; Vuolo, Wai-Tim, and Atzberger Citation2017; Malek et al. Citation2018; Cao et al. Citation2020; Zhang et al. Citation2020; Hu et al. Citation2019). As images acquired by the same sensor (e.g. Landsat TM/ETM+/OLI) are not affected by radiometric or geometric inconsistencies, this type of methods has become the mainstream for thick cloud removal. Representative and commonly used methods include Neighborhood Similar Pixel Interpolator (NSPI) (Chen et al. Citation2011), modified neighborhood similar pixel interpolator (mNSPI) (Zhu et al. Citation2012), and weighted linear regression model (WLR) (Zeng, Shen, and Zhang Citation2013).

Most previous research relied on one cloudless image as the reference image (Zhu et al. Citation2012; Cheng et al. Citation2014; Zhang et al. Citation2018; Xu et al. Citation2022). The reconstruction accuracies depend on the similarity between the reference image and the cloud image. Generally, larger time interval between the reference image and the cloud image indicates weaker correlation between them (Lin et al. Citation2013; Du et al. Citation2019). Recent methods proposed utilization of multi-temporal reference images (B. Chen et al. Citation2017; Cao et al. Citation2020; Zhang et al. Citation2020; Zhou et al. Citation2022). For example, B. Chen et al. (Citation2017) sorted multi-temporal reference images and utilized the most similar three reference images. Cao et al. (Citation2020) proposed AutoRegression to Remove Cloud (ARRC) algorithm which uses multi-year (normally 3–4 years) Landsat surface reflectance time series to restore missing data. Zhang et al. (Citation2020) proposed a spatio-temporal patch group deep learning framework based on multi-temporal images. Incorporation of multiple reference images effectively reduce the dependency on a single reference image. However, it is challenging to determine the optimal set of reference images because too much dependency on time series reference information might lead to adverse effect when abrupt land cover change occurred in the cloud image (Cao et al. Citation2020; Zhang et al. Citation2021).

Another key to the success of cloud removal algorithms is how to integrate the reference information to fill the missing values. Two types of strategies, i.e. pixel-based and patch-based strategies, have been proposed. Pixel-based strategy aims to predict values of each target pixel based on neighboring pixels. Similarity metrics between neighboring pixels and the target pixel are usually developed to select similar neighboring pixels (Zhu et al. Citation2012; Cheng et al. Citation2014; Chen et al. Citation2017; Cao et al. Citation2020; Padhee and Dutta Citation2019). However, it is difficult to determine whether the similar pixels are still similar to the target pixel at the time of cloud image acquisition. Abrupt land cover change can lead to obvious change in the target cloud pixel, thus the utilization of “similar” pixels during other time leads to reconstruction errors (Chen et al. Citation2018; Li et al. Citation2019). We expect that paying more attention to the reference images with greater similarities with the targe image might help to alleviate the errors in similar pixel selection. Patch-based strategy involves learning relationships between pairs of cloudless and the simulated cloud-contaminated image patches as training datasets. Deep learning models, such as spatio-temporal patch group deep learning framework have been developed (Zhang et al. Citation2020). However, to date the proposed deep learning models primarily aimed to restore textural and structure patterns in the target image rather than achieve high accuracy of the surface reflectance estimation of individual pixels (Zhang et al. Citation2020, Citation2021; Xu et al. Citation2022). For example, Xu et al. (Citation2022) aimed to reconstruct Landsat Level-1 digital number images. The 16-bit images were down sampled to 8-bit to be better suitable for deep learning models. In addition, requirements for large number of training datasets and the complex parameter settings of deep learning restrict its wide applications for operational cloud removal.

In this study, we propose a new cloud removal algorithm for Landsat surface reflectance (SR) images that we call Virtual Image-based Cloud Removal (VICR). It combines virtual image construction and pixel-based prediction by using temporal, spectral, and spatial information in time-series reference images. The idea of virtual image is to produce a synthesis reference image within cloud region based on time-series reference images, which is closer to the image at the prediction time than any of the original reference images. Although the concept of virtual image was developed for spatio-temporal fusion of Landsat and MODIS data (Wang et al. Citation2020), it is introduced in cloud removal for the first time. In addition, VICR provides two functions different from previous cloud removal algorithms: (1) it allows automatic selection of the optimal set of temporal reference images; (2) it provides a new similarity index, namely temporal weighted spectral distance (TWSD), for similar pixel selection that aims to reduce prediction errors caused by temporal changes. We expect VICR to achieve more stable cloud removal performance in landscapes with complex spatial pattern and abrupt change than the previous cloud removal algorithms based on temporal reference image(s). In addition, we expect that VICR is computational efficient and can be practically applied for cloud removal for time-series images.

2. Virtual Image-based Cloud Removal (VICR) algorithm

VICR performs cloud removal for each cloud region separately. The basic idea of VICR is that Landsat SR image at band n within a cloud region can be reconstructed by the sum of a virtual image that is linearly transformed from time-series reference images and a residual image, which can be represented as the following equation:

(1)

{\hat{L}}_{p_c r} (n) = L_{v_c r} (n) + γ_{c r} (n)

(1)

where ${\hat{L}}_{p_c r} (n)$ indicates the Landsat SR image within the cloud region at band n at the prediction time (thereafter called target time), $L_{v_c r} (n)$ indicates the virtual SR image at band n within the cloud region, and $γ_{c r} (n)$ indicates the residual image within the cloud region.

VICR includes three main steps: (1) construct the virtual image $L_{v_c r} (n)$ in the cloud region by local linear transformation of time-series reference images; in this step, the optimal set of reference images are determined (Section 2.1); (2) for each pixel in the cloud region, select similar pixels in the neighborhood of the cloud region with TWSD (Section 2.2); (3) construct residual image $γ_{c r} (n)$ for cloud region based on similar pixels and then predict SR image ${\hat{L}}_{p_c r} (n)$ in the cloud region (Section 2.3) ().

Figure 1. The flowchart of VICR.

2.1 Construct virtual image within cloud region

VICR constructs virtual image within cloud region for each band. For band n, $L_{v_c r} (n)$ is a linear transformation of time-series reference image in the same region. It can be expressed as:

(2)

L_{v_c r} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_c r} (n) + b (n)

(2)

where $L_{t_c r} (n)$ is the reference image in the cloud region acquired at time t. Note that $L_{t_c r} (n)$ should be cloudless in the cloud region. T is the total number of reference images. $a_{t} (n)$ is the coefficient of the t^th reference image and $b (n)$ is a constant. Both parameters are to be estimated, and the optimal set of time-series reference images are to be determined.

2.1.1 Estimation of $a_{t} (n)$ and $b (n)$

To estimate $a_{t} (n)$ and $b (n)$ , we assume that the image in the cloud region and its neighborhood image around the cloud region share the same transformation function. Therefore, we generate a buffer area outside the cloud region using dilation operation with a window size of 31 × 31 pixels (“imdilate” function in MATLAB). The buffer area has approximately 15 pixels width, and we call it “buffer zone 1” (buf1) (). Next, $a_{t} (n)$ and $b (n)$ are estimated using the linear regression model fitted between time-series reference images within buf1 ( $L_{t_b u f 1} (n)$ ) and the image at the target time within buf1 ( $L_{b u f 1} (n)$ ).

Figure 2. Establishment of buffer zone 1 (buf1), buffer zone 2 (buf2) around the cloud region.

(3)

L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)

(3)

where $a_{t} (n)$ and $b (n)$ are derived using the least squares method. $γ_{b u f 1} (n)$ represents the residual image of band $n$ within buf1.

2.1.2 Selection of the optimal set of time-series reference images

In the above process, it is clear that the number and the acquisition time of the reference images affect the accuracy of regression fitting. We expect that a set of reference images can be selected to minimize $γ_{b u f 1} (n)$ so that the virtual image estimated from EquationEq. 3(3) $L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)$ (3) is as close as the real image at the target time. For this purpose, we build a new buffer area with the same window size outside buf1, which is called buffer zone 2 (buf2) (). The virtual image at the target time within buf2 ( $L_{v_b u f 2}$ ) is obtained through the regression parameters $a_{t} (n)$ , $b (n)$ and the reference images within buf2 ( $L_{t_b u f 2} (n)$ ) (EquationEq. 4(4) $L_{v_b u f 2} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 2} (n) + b (n)$ (4) ).

(4)

L_{v_b u f 2} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 2} (n) + b (n)

(4)

We hope that $L_{v_b u f 2}$ is very close to the real image at the target time $L_{b u f 2}$ . We assume that the reference images that yield the most accurate estimation of $L_{v_b u f 2}$ result in the highest fitting accuracy in buf1 as well as in the cloud region. We call these reference images the optimal set of reference images. In this study, we take an iterative strategy to select the optimal set of reference images.

Step 1. Select the image that is temporally closest to the target image and has no cloud/shadow in the cloud region as the first reference image (t₁ in ). In case that two images are selected and have the same temporal interval with the target image, the image acquired before the target date is preferred as the first reference image. Then, calculate coefficients $a_{1} (n)$ and $b (n)$ by EquationEq.3(3) $L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)$ (3) .

Figure 3. An illustration of selection and sorting of reference images. The tick marks denote the image acquisition dates. The time interval between neighboring tick marks is 16 days. The red tick mark denotes the date when the target image was acquired. The images on the gray tick marks have cloud pixels in the cloud region, therefore they are not considered as reference images. The images on the blue tick marks are cloud free within the cloud region. The arrows show the sequence of image selection. It starts from the temporally closest image before target date (shown in green square arrowhead), and then visit the image on the other side of target date. In this example, the images on t₁, t₂, … , and t₇ are added as input in sequence.

Step 2. Calculate $L_{v_b u f 2} (n)$ using the coefficients (EquationEq. 4(4) $L_{v_b u f 2} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 2} (n) + b (n)$ (4) ) and then calculate the root mean square error of $L_{v_b u f 2}$ averaged over all bands ( $\overline{R M S E}$ ) following EquationEq. 5(5) $\bar{R M S E} = \frac{1}{N} \sum_{n = 1}^{N} \sqrt{\frac{\sum_{k = 1}^{K} (L_{v_b u f 2} (x_{k}, y_{k}, n) - L_{b u f 2} (x_{k}, y_{k}, n))^{2}}{K}}$ (5) .

(5)

\bar{R M S E} = \frac{1}{N} \sum_{n = 1}^{N} \sqrt{\frac{\sum_{k = 1}^{K} (L_{v_b u f 2} (x_{k}, y_{k}, n) - L_{b u f 2} (x_{k}, y_{k}, n))^{2}}{K}}

(5)

where $(x_{k}, y_{k})$ is the position of the $k t h$ pixel and $K$ represents the total number of pixels in buf2. $N$ represents the total number of image bands.

Step 3. Select another image on the opposite side of the previous reference image date on the time axis (). If the image is cloud free within the cloud region, then consider it as the second reference image and add it in the input of EquationEq. 3(3) $L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)$ (3) (t₂ in ); otherwise, skip it and go to the other side of the image on the time axis. Calculate the virtual image within buf2 and its $\overline{R M S E}$ . $R$ epeat Step 3 until $\overline{R M S E}$ does not continue to decline. At this time, it is considered that the selected images are the optimal set of time-series reference images.

2.1.3 Generation of $L_{v_c r} (n)$ and $γ_{b u f 1} (n)$

Based on the optimal set of time-series reference images, we can generate the virtual image in the cloud region at target time $(L_{v_c r} (n))$ using EquationEq. 2(2) $L_{v_c r} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_c r} (n) + b (n)$ (2) and the residual image in buf1 ( $γ_{b u f 1} (n)$ ) using EquationEq. 3(3) $L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)$ (3) .

2.2 Construct TWSD and select similar pixels

In sections 2.2 and 2.3, we aim to estimate residual image $γ_{c r} (n)$ for the cloud region. The residual of each pixel in the cloud region is estimated by weighted average of the residuals of its similar pixels in $γ_{b u f 1} (n)$ . To capture the change in SR in temporal domain, we propose a new similarity index, i.e. temporal-weighted spectral distance (TWSD, EquationEq. 6(6) $T W S D_{i j} = \sqrt{\frac{\sum_{t = 1}^{T} \sum_{n = 1}^{N} |a_{t} (n)| * {[L_{t_c r} (x_{i}, y_{i}, n) - L_{t_b u f 1} (x_{j}, y_{j}, n)]}^{2}}{N}}$ (6) ), in which both spectral and temporal distances between pixels are considered.

(6)

T W S D_{i j} = \sqrt{\frac{\sum_{t = 1}^{T} \sum_{n = 1}^{N} |a_{t} (n)| * {[L_{t_c r} (x_{i}, y_{i}, n) - L_{t_b u f 1} (x_{j}, y_{j}, n)]}^{2}}{N}}

(6)

where $(x_{i}, y_{i})$ is the position of the $i$ th pixel in the cloud region and $(x_{j}, y_{j})$ is the position of the $j$ th pixel in buf1. $T W S D_{i j}$ represents the distance between the $i$ th pixel and the jth pixel in both spectral and temporal domain. Unlike previous spectral distance metrics (X. Zhu et al. Citation2012; B. Chen et al. Citation2017; Cao et al. Citation2020), we incorporate the transformation coefficient $a_{t} (n)$ in the calculation of the distance. Larger $a_{t} (n)$ indicates that the corresponding reference image has a higher contribution to the target image, thus the neighboring pixels at time t should be given greater weights. For each pixel in the cloud region, we select the 20 pixels with the lowest $T W S D$ (i.e. M = 20) as the similar pixels of the target pixel. These similar pixels are used to estimate the residuals of the target pixel and then to obtain the residual image in the cloud region.

2.3 Estimate residual image and recover cloud image

For a target pixel $(x_{i}, y_{i})$ in the cloud region, its residual is estimated using linear weighted sum approach considering the spatial and spectral-temporal distances to its similar pixels. The spatial distance $D i s t_{i j}$ between the target pixel $i$ and its similar pixel $j$ is calculated as their Euclidean distance.

(7)

D i s t_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}}

(7)

We use $T W S D_{i j}$ to represent the spectral-temporal distance between pixel $i$ and $j$ . Then, $D i s t_{i j}$ and $T W S D_{i j}$ are normalized through:

(8)

D_{i j} = \frac{D i s t_{i j} - m i n (D i s t_{i})}{max (D i s t_{i}) - m i n (D i s t_{i})} + 1

(8)

(9)

S T_{i j} = \frac{T W S D_{i j} - m i n (T W S D_{i})}{max (T W S D_{i}) - m i n (T W S D_{i})} + 1

(9)

where $D_{i j}$ and $S T_{i j}$ represent the normalized spatial distance and spectral-temporal distance respectively. $D i s t_{i}$ and $T W S D_{i}$ represent the arrays composed of the spatial distances [ $D i s t_{i 1}, D i s t_{i 2}, \dots, D i s t_{i M}$ ] and spectral-temporal distances [ $T W S D_{i 1}, T W S D_{i 2}, \dots, T W S D_{i M}$ ] between the target pixel $i$ and the M similar pixels, respectively (M = 20). The $max (\cdot)$ and $m i n (\cdot)$ are to find the maximum and minimum values of the array. The weight $w_{i j}$ of the similar pixel $j$ of the target pixel $i$ is calculated as:

(10)

w_{i j} = \frac{1 / (D_{i j} S T_{i j})}{\sum_{j = 1}^{M} 1 / (D_{i j} S T_{i j})}

(10)

Finally, for the cloud region we obtain the residual value of the target pixel i by weighted summation of the residuals of the similar pixels in buf1.

(11)

γ_{c r} (x_{i}, y_{i}, n) = \sum_{j = 1}^{M} w_{i j} \times γ_{b u f 1} (x_{j}, y_{j}, n)

(11)

When the residual image of the cloud area $γ_{c r} (n)$ is constructed, it is added to the virtual image $L_{v_c r}$ obtained in EquationEq. 1(1) ${\hat{L}}_{p_c r} (n) = L_{v_c r} (n) + γ_{c r} (n)$ (1) to predict the cloud free image in the cloud region ${\hat{L}}_{p_c r}$ . Following the above procedures, we reconstruct each cloud region separately. Once the images in all cloud regions are reconstructed, they are combined to obtain the final cloud-free image scene.

3. Experimental data and evaluation

3.1 Testing sites

Four sites with different landscape heterogeneities and land change patterns were selected to evaluate the VICR method (). The first site is at the Jianghan Plain in Hubei Province, China (112.8°E, 30.2°N). This site is an important agricultural production area in Central China. The crop types are dominated with rapeseed and paddy rice rotations (Fang et al. Citation2021) and the croplands are fragmented, presenting heterogeneous spatial patterns. The second site is in the Lower Gwydir Catchment, Australia (149.36°E, 29.08°S). Land use include grazing, dryland cropping, horticulture, and irrigated lands (Shi et al. Citation2014). A flood event occurred in mid-December 2004, resulting in an abrupt land cover change shown on the Landsat 5 TM image on December 12^th, 2004. The third site covers urban and suburban areas in Beijing (116.45°E, 39.95°N), China, presenting great spatial heterogeneity and vegetation phenological dynamics. The fourth site is at the coastal wetland in Yellow River Delta in Shandong Province, China (119.2°E, 37.75°N). As an area with intense land sea interactions, land surface reflectance changes frequently due to tidal inundation and sediment accumulation. In recent years, the landscape patterns in the Yellow River Delta have changed dramatically due to rapid expansion of an invasive plant species Spartina alterniflora (Z. Wang et al. Citation2021).

Figure 4. Actual images and cloud-simulated images in standard false color composition over (A) Jianghan Plain (Landsat 8 OLI), (B) Lower Gwydir Catchment (Landsat 5 TM), (C) Beijing (Landsat 8 OLI), and (D) Yellow River Delta (Landsat 8 OLI). Acquisition dates are in the parentheses in the format of MM/DD/YY. The image sizes for the testing sites A-C are 1200 × 1200 pixels, and that of testing site D is 1397 × 1234 pixels. The number of simulated cloud pixels in (A-D) are 93,312, 107,529, 89813, and 103,174, respectively.

3.2 Landsat data and preprocessing

To evaluate the VICR method, we selected a cloud-free Landsat imagery at each site and simulated a cloud region (hereafter we call target image) (). The cloud region has around 90,000 ~ 110,000 pixels. Time-series Landsat images with cloud cover less than 80% acquired before and after the target date were considered as candidate reference images. All Landsat images were Collection 1 Tier 1 Level 2 surface reflectance products downloaded from Google Earth Engine. For each Landsat image, we used the quality assurance band to detect cloud and cloud shadow regions and mask them. We did not use Landsat 7 ETM+ images because the images had missing strips after the scan-line corrector failed in 2003.

3.3 Assessment metrics

The performance of VICR was evaluated by comparing the reconstructed images and the actual image in cloud region. Quantitative assessment was conducted using Correlation Coefficient (CC), Root Mean Square Error (RMSE) and Structural Similarity Index Measure (SSIM). CC represents the degree of agreement between the estimation and the real value. RMSE measures the statistical difference of pixel values between the predicted image and the actual image. SSIM evaluates the overall structure similarity between the two images (Zhou et al., Citation2004). The quantitative evaluation was carried out for six bands of Landsat OLI/TM sensors, i.e. blue, green, red, near infrared red (NIR), short-wave infrared 1 (SWIR1), and short-wave infrared 2 (SWIR2) bands.

4. Experiment design and results

Six experiments were designed to evaluate the performances of VICR method from multiple aspects. The specific experiment design and their purposes of each experiment are shown in . We selected mNSPI (modified neighborhood similar pixel interpolator), WLR (weighted linear regression), and ARRC (AutoRegression to Remove Clouds) for comparison because they all involved using both temporal and spatial auxiliary information in reference image(s) for cloud removal. The source codes are publicly available except ARRC (mNSPI: https://xiaolinzhu.weebly.com; WLR: http://sendimage.whu.edu.cn/send-resource-download). mNSPI and WLR are based on a single reference image and have been widely used (Zhu et al. Citation2012; Zeng, Shen, and Zhang Citation2013). mNSPI is provided in IDL code, and WLR is provided with an executable program. ARRC is a recently-proposed cloud removal method based on time series Landsat images (Cao et al. Citation2020). It combines predictions from a long-term component using autoregression of time series observations (3 years of Landsat images were used as recommended by ARRC) and a short-term component based on a single cloudless reference image. As the source code is not available, we rewrote ARRC on MATLAB. VICR was also coded on MATLAB. All processes in this study were carried out on a workstation (Dell-T8520, CPU: i9 -10,900× 3.70 GHz, RAM: 32GB). The code for VICR is available at https://github.com/wangzp-ludan/VICR.

Table 1. Reference images and their input orders for VICR in Jianghan Plain, Lower Gwydir Catchment, Beijing and Yellow River Delta.

Download CSV Display Table

Table 2. Specific information on the six experiments.

Download CSV Display Table

4.1 Experiment I: Assessment on the selection of optimal set of reference images

4.1.1 Experiment design

VICR uses the fitting parameters obtained from buf1 to predict virtual image in buf2 ( $L_{v_b u f 2}$ ) at the target time, and the RMSE between the $L_{v_b u f 2}$ and the actual image in buf2 ( $L_{b u f 2}$ ) is used to guide determination of the optimal set of time-series reference images. The basic assumption is, with changing reference images the prediction errors in the cloud region present the same variation patterns as the $\overline{R M S E}$ s of $L_{v_b u f 2}$ , because buf2 and the cloud region are spatially close. To justify the assumption, we conducted VICR based on the input of 1 to 12 reference images. The reference images were selected and added to the reference image list following the procedures in Section 2.1 (), and the order of the input images is shown in . The $\overline{R M S E}$ of $L_{v_b u f 2}$ , the $\overline{R M S E}$ of the virtual image in the cloud region ( $L_{v_c r}$ ), and the $\overline{R M S E}$ of the reconstructed image in the cloud region ( ${\hat{L}}_{p_c r}$ ) were calculated for four testing sites, respectively.

4.1.2 Experiment results

In general, the $\overline{R M S E}$ s of $L_{v_b u f 2}$ , $L_{v_c r}$ and ${\hat{L}}_{p_c r}$ showed consistent pattern with increasing number of reference images (). They all presented decreasing trend with first a few reference images and then gradually stabilize. For Beijing, the $\overline{R M S E}$ s of $L_{v_b u f 2}$ , $L_{v_c r}$ and ${\hat{L}}_{p_c r}$ all reached the minimum when three reference images were used as input (). For Jianghan Plain, Lower Gwydir Catchment and Yellow River Delta, the $\overline{R M S E}$ of $L_{v_b u f 2}$ stoped decreasing when the 7^th, 6^th and 6^th reference image was added, respectively (). Thus, the first 7, 6, and 6 images were selected as the optimal set of reference images for the three sites, respectively. With these reference images, unfortunately, the $\overline{R M S E}$ of ${\hat{L}}_{p_c r}$ was not the minimum. For Jianghan Plain, Lower Gwydir Catchment and Yellow River Delta, the $\overline{R M S E}$ of ${\hat{L}}_{p_c r}$ reached the minimum with 4, 11 and 10 reference images, respectively (). However, little difference was found between the minimum $\overline{R M S E}$ and the $\overline{R M S E}$ resulted from the selected set of reference images (1.43 × 10⁻⁴ for Jianghan Plain, 0.3 × 10⁻⁴ for Beijing, and 4.5 × 10⁻⁴ for Yellow River Delta), indicating that good accuracies are still achieved with these reference images.

Figure 5. $\overline{R M S E}$ s of the virtual image in the buffer zone 2 ( $L_{v_b u f 2}$ ), the virtual image in the cloud region ( $L_{v_c r}$ ), and the reconstructed image in the cloud region ${\hat{L}}_{p_c r}$ resulted from one to twelve reference images. The minimum RMSEs of $L_{v_b u f 2}$ and ${\hat{L}}_{p_c r}$ are marked with triangles.

Figure 5. RMSE‾s of the virtual image in the buffer zone 2 (Lv_buf2), the virtual image in the cloud region (Lv_cr), and the reconstructed image in the cloud region L^p_cr resulted from one to twelve reference images. The minimum RMSEs of Lv_buf2 and L^p_cr are marked with triangles.

shows that the first few reference images temporally close to the target date had more contribution than those images farther from the target date. The $\overline{R M S E}$ s first decreased substantially and then had little change or even increased slightly. It seems that more reference images do not necessarily lead to higher accuracies. For instance, for Beijing the $\overline{R M S E}$ s of ${\hat{L}}_{p_c r}$ showed slight increase when more than 3 reference images were used as input; for Lower Gwydir Catchment, the $\overline{R M S E}$ from 12 reference images was higher than that from 6 and 11 reference images. In addition, we observe that the $\overline{R M S E}$ of ${\hat{L}}_{p_c r}$ was much lower than that of $L_{v_c r}$ regardless of the number of reference images, suggesting that the estimation of residual image based on similar pixels is critical for the estimation of ${\hat{L}}_{p_c r}$ .

4.2 Experiment II: Assessment on TWSD

4.2.1 Experiment design

VICR uses TWSD to select similar pixels in buf1 and to estimate residuals for the target pixels in cloud region. The newly proposed TWSD considers the spectral similarity between pixels in temporal domain. To test the efficacy of TWSD, we compared the TWSD to spectral distance ( $S D$ ) that has been widely used for similar pixel selection in cloud removal algorithms such as NSPI, mNSPI, STWR and the ARRC short-term component. Apart from similar pixel selection, these algorithms also used $S D$ to quantify the contribution of the similar pixels to the target pixel. In this experiment, we designed four scenarios.

Scenario 1: $S D$ was used for both similar pixel selection and residual allocation (substitute TWSD in EquationEq. 9(9) $S T_{i j} = \frac{T W S D_{i j} - m i n (T W S D_{i})}{max (T W S D_{i}) - m i n (T W S D_{i})} + 1$ (9) with $S D$ ) based on virtual images in cloud region and buf1. The formulation of $S D$ is listed in EquationEq. 12(12) $S D_{i j} = \sum n = 1 N L_{v_c r} (x_{i}, y_{i}, n) - L_{v_b u f 1} (x_{j}, y_{j}, n) 2 N$ (12) .

Scenario 2: $S D$ was used to select similar pixels, and TWSD was used to allocate residuals.

Scenario 3: TWSD was used to select similar pixels, and $S D$ was used to allocate residuals.

Scenario 4: TWSD was used for both similar pixel selection and residual allocation (i.e. VICR).

(12)

S D_{i j} = \sum n = 1 N L_{v_c r} (x_{i}, y_{i}, n) - L_{v_b u f 1} (x_{j}, y_{j}, n) 2 N

(12)

For each site and each band of Landsat image, the cloud-removed image in cloud region was compared with the actual image, and RMSE was calculated. By comparing the RMSEs of reconstruction results, the role of TWSD in similar pixel selection and residual redistribution was analyzed.

4.2.2 Experiment results

shows that TWSD for both similar pixel selection and residual allocation (Scenario 4) achieved the lowest RMSE for the four testing sites (except for the NIR and SWIR1 image in Beijing), while Scenario 1 produced the highest RMSE. When TWSD was used for similar pixel selection (Scenario 3 and 4), the reconstruction accuracies were significantly improved regardless which metrics were used for residual allocation (Scenario 3 vs. Scenario 1; Scenario 4 vs. Scenario 2). When $S D$ was used for similar pixel selection, TWSD used for residual estimation slightly improved the reconstruction accuracy (Scenario 2 vs. Scenario 1). Compared to the other three testing sites, the Lower Gwydir Catchment which experienced abrupt land change witnessed greater accuracy improvement when TWSD was used for similar pixel selection (). This suggests that the similarity between similar pixels and target pixels played a critical role in reconstruction, especially for the images with great land change.

Figure 6. Rmses of the reconstructed cloud image at each band for (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta at four scenarios where spectral distance (SD) and/or temporally weighted spectral distance (TWSD) for similar pixel selection and/or residual allocation.

4.3 Experiment III: Quantitative evaluation and comparison with other cloud-removal methods

4.3.1 Experiment design

To evaluate the performance of VICR, we compared the reconstruction accuracies of VICR with those of mNSPI, WLR, and ARRC. We conducted two sub-experiments. The first one was performed on the cloud-simulated images in . For mNSPI, WLR, and ARRC short-term component, we used the cloud free image that was closest to the cloud contaminated image as reference image (, Fig. S1). The second sub-experiment was performed for each site on another cloud simulated image (Fig. S2). On these images, three cloud regions with a total of over 120, 000 pixels were generated and randomly distributed on the image (Fig. S2). For both sub-experiments, 20 similar pixels were selected for VICR, mNSPI, WLR and ARRC short-term component for fair comparison. CC, RMSE and SSIM in the cloud regions were used as quantitative metrics. For the second sub-experiment, the values of the quantitative metrics for the three cloud regions were averaged.

4.3.2 Experiment results

Due to space limitation, here we display the results of the first sub-experiments. The results of the second sub-experiment are shown in supplementary materials (Fig. S2 and Fig. S3). Generally, the reconstructed images from VICR are visually more similar to the actual image than those from mNSPI, WLR and ARRC (). For example, for the patchy croplands in Jianghan Plain, the cloud-removed images from WLR and ARRC showed poorer spatial heterogeneities and obviously smoother effects than VICR. Some fallow croplands were not well reconstructed. Both VICR and mNSPI better restored spatial details, while VICR obviously showed higher spectral similarities with the actual image (). For Lower Gwydir Catchment site where flooding caused significant land cover change, the cloud-removed images from mNSPI, WLR and ARRC were more similar to the reference image (Fig. S1). In comparison, the predicted image from VICR showed better spectral similarity with the actual image in part of the flood-inundated region (). For Beijing site where the land change in urban areas was relatively small, the results from the four methods were close (). For the Yellow River Delta, the results from mNSPI, WLR and ARRC demonstrated obvious spectral and texture discrepancies from the actual image and the tidal creeks were not well restored. In contrast, VICR generated more visually satisfactory results where tidal creeks and tidal flats kept intact ().

Figure 7. Visual comparisons between mNSPI, WLR, ARRC, and VICR the cloud-simulated images over (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta.

Quantitative assessment results are shown in . In general, VICR presented much better performances (lower RMSE, higher CC and SSIM) than mNSPI and WLR for all sites at all bands. The decrease in RMSEs ranged from 0.0014 to 0.0223 compared to mNSPI, and ranged from 0.0017 to 0.0193 compared to WLR. For all sites, VICR performed better than ARRC in terms of CCs and RMSEs, especially for the sites with abrupt land change. For Lower Gwydir Catchment, VICR showed considerably higher CCs and lower RMSEs at red (0.0368 vs. 0.0400), green (0.0312 vs. 0.0334), SWIR1 (0.0422 vs. 0.0463) and SWIR2 (0.0302 vs. 0.0342) bands, indicating that VICR better captured the flood inundation than ARRC (). At the SWIR1 and SWIR2 bands, VICR showed higher SSIMs than ARRC. For the Yellow River Delta, the improvements in CCs and RMSEs were more significant compared to the other sites (). For example, CCs of VICR at NIR, SWIR1 and SWIR2 bands were 0.9507, 0.9221 and 0.9124, respectively, while those of ARRC were 0.9260, 0.8995 and 0.8962. The RMSEs of VICR at NIR, SWIR1 and SWIR2 bands were 0.0234, 0.0255 and 0.0169 respectively, while those of ARRC were 0.0286, 0.0290 and 0.0185, respectively. In most cases, VICR achieved comparable or slightly higher SSIM than ARRC. For Beijing and Jianghan Plain, VICR produced slightly lower SSIMs than ARRC at SWIR1 and SWIR2 bands (). In Beijing site, SSIM values of ARRC at SWIR1 and SWIR2 bands were 0.9485 and 0.9514 respectively, while those of VICR were 0.9454 and 0.9485 respectively. The second sub-experiments obtained similar comparison results as the first sub-experiment (Fig. S2 and Fig. S3), i.e. VICR achieved much better performance than mNSPI and WLR at all sites and all image bands, and in most cases VICR achieved higher CC, SSIM and lower RMSE than ARRC.

Figure 8. Comparisons of CC (left column), RMSE (center column) and SSIM (right column) between mNSPI, WLR, ARRC, and VICR for the cloud-simulated images over (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta.

4.4 Experiment IV: Sensitivity to reference images

4.4.1 Experiment design

We hope that a robust cloud removal method is little affected by the selection of reference images. To evaluate the sensitivity of mNSPI, WLR, ARRC, and VICR to different reference images, we conducted two sub-experiments by altering reference images and analyzing the change in reconstruction performances. The first sub-experiment assumes that the reference images in Experiment III were not available. For mNSPI, WLR, and ARRC short-term component, we replaced the reference image in Experiment III (images with * in and Fig. S1) with another cloud-free image (images with ** in and Fig. S1). For ARRC long-term component and VICR, we removed the reference image from the time-series reference image list. Note that the time intervals between the target image and the new reference image were longer than or same as the original reference image. In the second sub-experiment, we assumed the revisit period of Landsat satellite is 32, 48 and 64 days and thus lengthened the time interval between the reference image and the target image, and among the time-series reference images. For each site we predicted the image in the cloud region and compared the performances of the four algorithms.

4.4.2 Experiment results

displays the results from the first sub-experiment. In Jianghan Plain, Lower Gwydir Catchment and Beijing, the RMSEs from all algorithms increased at most bands when using the new reference image (). Compared to mNSPI and WLR, ARRC and VICR obviously showed smaller increment of RMSEs, especially at the SWIR1 and SWIR2 bands for Lower Gwydir Catchment (), and at all bands for Beijing (). For the Beijing site, assuming that the reference image on 09/28/17 (MM/DD/YY) was not available, the next available reference image was on 10/30/17. From the target date (09/12/17) to the end of October, the land surface reflectance had changed a lot due to crop harvesting and vegetation phenological change (Figure S1A). In Lower Gwydir Catchment, although the time intervals between the two reference images (11/26/04 and 12/28/04) and the target image (12/12/04) were both 16 days, the accuracies of mNSPI and WLR decreased significantly because flood event occurred during mid-December. This indicates that large surface reflectance change has substantial negative effects on the performances of mNSPI and WLR.

Figure 9. Comparisons of RMSEs between mNSPI, WLR, ARRC and VICR for the cloud-simulated images for (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta in Experiment IV (colored bars) and in Experiment III (transparent bars with black border).

In Yellow River Delta, it is interesting that using the reference image temporally farther from the target date (more than a month from 06/29/18 to 07/31/18) achieved better performance than the original reference image (08/16/18) for all methods (). mNSPI, WLR and ARRC all showed considerable decrease in RMSEs at all bands, while the VICR presented more stable performance. Examination of the reference images showed that the wetlands at this site experienced tidal inundation due to high tide level on 08/16, resulting in great surface reflectance change. The reference image on 06/29 is spectrally more like the target image. Although ARRC long-term component utilized time-series Landsat images, its short-term component is still based on a single reference image. When surface reflectance experienced abrupt change, the short-term component is given higher weight when combining with long-term components (Cao et al. Citation2020). As a result, changing the reference image still had considerable impacts on the accuracies of ARRC in the cases where surface reflectance has abrupt change. Compared to ARRC, the change in RMSEs of VICR was negligible in Yellow River Delta (). In addition, even if the time interval between the reference image and the target image becomes longer, the VICR still ensured the lowest RMSE compared to other methods for all four sites.

shows the RMSEs averaged over all bands ( $\overline{R M S E}$ ) from the second sub-experiment. All four algorithms showed increasing $\overline{R M S E}$ with increasing time intervals among images, while VICR generally showed lower magnitude of $\overline{R M S E}$ variation (except at Jianghan Plain). The reconstruction errors of VICR were significantly lower than mNSPI and WLR, and slightly lower than ARRC (except for revisit cycle = 32 days at Jianghan Plain). In Lower Gwydir Catchment and Yellow River Delta, VICR presents obvious improvement in accuracies ().

Figure 10. Comparisons of $\overline{R M S E}$ s between mNSPI, WLR, ARRC and VICR for the cloud-simulated images for (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta with revisit periods of 16, 32, 48 and 64 days.

Figure 10. Comparisons of RMSE‾s between mNSPI, WLR, ARRC and VICR for the cloud-simulated images for (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta with revisit periods of 16, 32, 48 and 64 days.

4.5 Experiment V: Sensitivity to different cloud sizes

4.5.1 Experiment design

Intuitively, larger cloud region suggests less auxiliary information in the surrounding area can be provided for the reconstruction, especially for the central area of the cloud region. To evaluate the sensitivity of VICR to different cloud sizes, we generated clouds with sizes of 50 × 50, 100 × 100, 150 × 150, 200 × 200, 250 × 250, 300 × 300, 350 × 350, and 400 × 400 pixels () and reconstructed images in the cloud region based on the same reference images in Experiment III. For each method and cloud size, RMSEs at each band in the center 50 × 50 pixels was calculated and compared. Note that for Yellow River Delta, we used the image on 06/29/18 as reference image for mNSPI, WLR and ARRC short-term component to avoid the impacts of tidal inundation and to ensure good results.

Figure 11. Simulated clouds with 50×50, 100×100, 150×150, 200×200, 250×250, 300×300, 350×350 and 400×400 pixels on the actual image in the four sites.

4.5.2 Experiment results

illustrates RMSEs at NIR band of the four methods for different cloud sizes. RMSEs at other bands are shown in Fig. S4-S8 due to space limitation. VICR achieved considerably lower RMSE than mNSPI and WLR at all bands regardless of the cloud size for Jianghan Plain, Beijing and Yellow River Delta. In most cases, VICR presented slightly lower RMSEs than ARRC. In addition, RMSEs of VICR showed smaller fluctuations than ARRC with varying cloud sizes. This was particularly noticeable at NIR band in Lower Gwydir Catchment (), and at SWIR1 and SWIR2 bands in Yellow River Delta (Fig. S6). In Lower Gwydir Catchment, the RMSEs of all algorithms showed upward trend with increasing cloud size (). As the cloud covered the area with flood inundation, smaller cloud region indicated more flood-inundated neighboring pixels can be found. In an extreme case that the cloud covers the entire flood-inundated area, none of the neighboring pixels is spectrally similar to the target region, which certainly results in low reconstruction accuracies. Compared to mNSPI and WLR, VICR presented smaller increment of RMSE at SWIR1 and SWIR2 bands (Fig. S6), which are both representative bands showing water signal. Likewise, in Yellow River Delta, VICR achieved noticeably lower and more stable RMSEs than other methods at SWIR1 and SWIR2 bands for all cloud sizes.

Figure 12. The RMSEs of the reconstructed images from mNSPI, WLR, ARRC, and VICR at NIR band for cloud sizes ranging from 50×50 pixels to 400×400 pixels. Note that the RMSEs were calculated for the 50×50 pixels in the center of the clouds.

4.6 Experiment VI: Performance on real cloud-contaminated images

4.6.1 Experiment design

In this experiment, we performed cloud removal for real cloud-contaminated images and visually compared the reconstruction results of different methods. Considering practical applications, we also used VICR to remove clouds on the time-series Landsat 8 OLI SR images during 2015–2019 at Jianghan Plain and during 2016–2020 at Yellow River Delta. We calculated Normalized Difference Vegetation Index (NDVI) based on each reconstructed image and compared the time series NDVIs with the NDVIs from the Terra Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Indices Version 6 product (MOD13Q1). We sorted the time-series Landsat images in order of the cloud cover percentage and processed them one by one from low to high cloud cover. Each cloud-removed image was then used as the reference image for the images with larger cloud cover. We call this process “time-series cloud removal procedure.”

4.6.2 Experiment results

As illustrated in , the images reconstructed by VICR generally recovered better spatial details and maintained more realistic spectral characteristics compared to other algorithms. In Jianghan Plain, the reconstructed image by VICR presented greater spectral heterogeneity in cropland patches compared to other algorithms. The boundaries between the cropland patches were more visible in the image reconstructed by VICR. In Lower Gwydir Catchment, the images reconstructed by VICR, mNSPI and WLR presented similar spatial details, while that by ARRC presented smoother effect. In Yellow River Delta, it is obvious that the QA band layer detected larger clouds/cloud shadow areas than those in the actual image. The zoomed-in actual image covers a Spartina Alterniflora patch (Wang et al. Citation2021) which is partially contaminated by clouds in the southern coast of Yellow River Delta (yellow box in ). The reconstructed image by VICR restored more accurate spectral information than the other methods as it is spectrally more similar to the actual image.

Figure 13. Visual effects of reconstructed images from VICR, mNSPI, WLR and ARRC on real cloud-contaminated images over (a) Jianghan Plain, (b) Lower Gywdir Catchment, (c) Beijing and (d) Yellow River Delta.

Figure 13. (Continued).

shows that the time series images recovered by VICR effectively supplemented the number of NDVI observations. The number of observations increased from 31 to 47 during 2015–2019 at the pixel in Jianghan Plain. The NDVI time series presented phenological dynamics well at both sites. In Jianghan Plain, rapeseed was normally harvested around May and then rice was planted. The crop rotation is observed in MOD13Q1 time series as it showed a local peak around March and a higher local peak around August. The local peak around March 2016 is not observed from the original Landsat NDVI time series but is well captured by the cloud-removed image (). Likewise, the growing peak of paddy rice in August 2018 was better captured by the cloud-removed images (). For the Phragmites australis marsh wetland pixel in Yellow River Delta, the reconstructed NDVI time series demonstrated greater variations than MOD13Q1 (). Negative NDVIs were obtained during December to February because the wetland pixel was covered by water in leaf-off season. In addition, the reconstructed NDVI time series better demonstrated growing and senescence patterns of Phragmites australis compared to the original time-series NDVI (e.g. in 2019).

Figure 14. NDVIs from cloud-free and cloud-removed Landsat images at (a) the pixel (30.2477°N, 112.8263°E) at a crop field in Jianghan Plain and (b) the pixel (37.8163°N, 119.0197°E) at Phragmites australis marsh wetland in Yellow River Delta. For comparison, MOD13Q1 NDVI time-series smoothed by spline function are plotted as blue line.

5. Discussion

VICR is different from previous algorithms in that it is based on virtual image construction. It is composed of patch-based virtual image prediction based on time-series reference images and pixel-based residual allocation incorporating spectral, temporal, and spatial information. Advantages of this algorithm are discussed in Section 5.1-5.4. The uncertainties are discussed in Section 5.5.

5.1 Virtual image construction for optimal reference images selection

The first improvement of VICR is that it allows automatic selection of optimal set of reference images. It is widely recognized that the selection of reference image has great impacts on the performance of cloud removal. Generally, the temporally closer reference image has more similar spectral characteristics to the target image so that higher reconstruction accuracy can be achieved (Zhu et al. Citation2012; Wang et al. Citation2021). However, this might not be true when sudden land change occurred. In the coastal wetlands in Yellow River Delta, the closest reference image resulted in lower accuracies of mNSPI, WLR and ARRC than using reference image with longer time interval to the target image. Several approaches have been proposed to address this issue. Some studies considered using CC or SSIM to select the most similar reference image (Lin et al. Citation2013; Kalkan and Derya Maktav Citation2018), but it was also proved that this method did not improve the reconstruction accuracy considerably (Cao et al. Citation2020). STWR algorithm sorted the reference image patches based on the spectral similarity and selected the most similar three patches as reference images (Chen et al. Citation2017). It is hard to determine whether the three selected images can achieve the highest accuracy. Another solution is to incorporate long-term time-series reference images such as the ARRC long-term component. However, the ARRC long-term component is advantageous in capturing gradual temporal patterns of a pixel rather than capturing abrupt change (Cao et al. Citation2020). Our algorithm balances between ARRC and those algorithms depending on a fixed number of reference images because it can flexibly select the optimal set of reference images by virtual image construction strategy. In virtual image construction, the coefficient a_t represents the contribution of each the reference image to the target image. As shown in Fig. S9, when the second reference image was added for Lower Gwydir Catchment and Yellow River Delta, the a_t value of the first reference image has a substantial decrease and becomes lower than that of the second image, indicating the second reference image has greater contribution in prediction. The varying a_t values when adding input reference images ensure that the land change pattern can be captured. In addition, based on our results in Section 4.1, we recommend that the optimal number of reference images for each cloud region need not exceed 12 because more reference images at this time can no longer significantly improve the reconstruction accuracy.

The idea of virtual image refers to the virtual image pair-based spatio-temporal fusion (VIPSTF) algorithm proposed by Wang et al. (Citation2020), which constructs virtual Landsat-MODIS image pairs to predict image with the same spatial resolution as Landsat and the same temporal resolution as MODIS. The virtual image pair was generated by transplanting the linear relationship between time-series MODIS images to Landsat images. It was demonstrated theoretically that the virtual image pair is closer to the data at the prediction time than the known Landsat-MODIS image pairs (Wang et al. Citation2020). Inspired by this concept, in this study we constructed virtual image within cloud region to provide an initial prediction of the cloud image. Although the equation of building virtual image (EquationEq. (3))(3) $L_{b u f 1} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_b u f 1} (n) + b (n) + γ_{b u f 1} (n)$ (3) is similar as the auto-regression used in ARRC [EquationEq. 2(2) $L_{v_c r} (n) = \sum_{t = 1}^{T} a_{t} (n) \times L_{t_c r} (n) + b (n)$ (2) in Cao et al. (Citation2020)], the essential difference is that ARRC establishes regression model for each individual pixel, while VICR build one model for the whole image patch within the cloud region. Furthermore, we innovatively proposed to build two buffer zones around the cloud region, with each playing different roles. Buf1 helps to establish linear relationship between time-series reference images and the target image, and buf2 helps to find the optimal set of reference images. Results in Experiment I verified that this strategy is reasonable, and the results are reliable. More importantly, it eliminates the need for human intervention of reference image selection, which makes the algorithm more operational.

5.2 Advantages of TWSD and its potential applications

The second improvement of VICR is that the newly proposed TWSD can reduce the reconstruction errors by incorporating temporal information in similar pixel selection and residual allocation compared to the widely used SD (as illustrated in Section 4.2). The major contribution of TWSD to the accuracy improvement is in the process of similar pixel selection. Most existing similar pixel selection methods assumed that the similar pixels on the reference image remain similar on the cloud image (Chen et al. Citation2011), which is probably not true when land cover experienced abrupt change. For those methods based on multi-temporal or time-series reference images (such as ARRC), similar pixels were defined as those with similar temporal trend of spectral reflectance (Chen et al. Citation2017; Cao et al. Citation2020; Yan and Roy Citation2020; Chen et al. Citation2021). The similarity metrics treats the multi-temporal (or time-series) reference images as equally important. However, in real cases, the greater the differences between the reference images and the cloud image, the less reliability of the selected similar pixels based on these reference images. It should be noted that geostatistical methods can also be used to calculate the weights. Such methods include Geostatistical Neighborhood Similar Pixel Interpolator (GNSPI) (Zhu et al. Citation2012), which adopted ordinary Kriging to interpolate the residuals. Although GNSPI has shown to perform better than the empirical methods such as NSPI and mNSPI, the calculation of semi-variogram definitely increased the computational burden, especially when using multiple reference images. In this study, we developed TWSD by using the coefficient $a_{t}$ obtained from linear regression model as the weight in order to consider different contributions from the reference images. A larger $a_{t}$ value represents that the reference image at the corresponding time t contributes more to the prediction of the cloud image. This indicates that the similar pixels calculated based on this reference image are more reliable. A smaller $a_{t}$ indicates that the similar pixel selected based on this reference image is less relevant.

TWSD has potential to be applied to other tasks requiring similar pixel selection such as spatial-temporal fusion (STF) algorithms, where spectral change in coarse resolution pixels need to be interpolated according to similarity between neighboring pixels and target pixels to reconstruct fine-resolution image (Zhu et al. Citation2018; Liu et al. Citation2019; Zhou et al. Citation2021). Some STF algorithms utilized multi-temporal Landsat-MODIS image pairs as references. Typical examples include ESTARFM (Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model) (Zhu et al., 2010), STAIR (SaTellite dAta IntegRation) (Luo et al. Citation2020), and VIPSTF (Virtual image pairs-based spatio-temporal fusion) (Wang et al. Citation2020). In these algorithms, TWSD has potential be applied to calculate the spectral similarity in temporal domain to better select similar pixels and capture land change.

5.3 Robustness and efficiency of VICR method

Experiment III showed that VICR achieved better accuracies than mNSPI, WLR and ARRC, and the improvement in the accuracies was more obvious in the cases with abrupt land cover change. Compared to the other three algorithms, the performances of VICR were less affected when removing the closest reference image or when increasing the revisiting period (Experiment IV) and were less affected with varying cloud sizes (Experiment V). This is understandable because VICR is not dependent on one reference image, and it flexibly selects the optimal set of time-series reference images. In addition, it captures temporal change in the key processes including virtual image construction, similar pixel selection, and residual distribution by taking into account temporal contribution of each reference image.

The results of Experiment VI showed that VICR achieved good visual effects for real cloud removal and obtained reasonable time-series NDVI. It costs much less processing time than ARRC, and the cumulative processing time of cloud removal for five-year Landsat time-series images is acceptable (Fig. S10). The good computational efficiency can be explained by that the linear regression for virtual image construction is conducted based on images instead of individual pixels. Different from ARRC that performs autoregression for all neighboring pixels and calculates the contribution of all neighboring pixels to the target pixel, the virtual image construction based on image patches can substantially improve the computation efficiency. The combination of patch-based virtual image construction and pixel-based residual distribution enables both good accuracy and efficiency. The promising performance and high computing efficiency highlighted the operationality of VICR for cloud removal from a large volume of Landsat datasets.

5.4 Computation efficiency of VICR method

We expect a good cloud removal algorithm to have reasonable computation efficiency besides high accuracy, so that it can be applied to large number of time-series images in an operational way. In our experiments, we compared the computing time of mNSPI, WLR, ARRC and VICR in Experiment III (the first sub-experiment) and Experiment V. We also calculated the total computing time of VICR for cloud removal from time-series Landsat images in Experiment VI. Although mNSPI is provided in IDL code and ARRC and VICR are written on MATLAB, previous comparisons showed equivalent efficiency between IDL and MATLAB (Liu et al. Citation2019).

lists the computation time of mNSPI, WLR, ARRC, and VICR when performing on the simulated cloud region in Experiment III. illustrates the computation time averaged on each pixel in cloud regions with different sizes in Experiment V. In Experiment III, the computation time is: ARRC > mNSPI > VICR > WLR (). In most cases in Experiment V, the computation time per pixel of VICR is slightly longer than mNSPI and WLR. In both experiments, ARRC required much longer time than the other three methods. It is interesting that mNSPI took much longer time in Experiment III than in Experiment V even if the cloud sizes are similar (e.g. 93,312 pixels in Jianghan Plain site in vs. the 300 × 300 cloud region in ). This is probably because mNSPI requires searching for the central pixel of the cloud region, and this process is much less efficient for irregular-shaped cloud than that for regular-shaped cloud. Therefore, it is expected that mNSPI has large variation in computation efficiency for reconstructing real cloud images. The total time of VICR for the largest cloud (160,000 pixels for 400 × 400 cloud) was only 480 seconds (maximum 0.003 seconds per pixel in ). When processing time-series Landsat images in 5-year periods in Experiment VI, the cumulative processing time was around 15.4 hours (55,000 seconds) for Jianghan Plain (47 images) and 32.8 hours for Yellow River Delta (72 images) (Fig. S9).

Figure 15. Elapsed time per cloud pixel (in seconds) for mNSPI, WLR, ARRC and VICR for cloud size of 50 $\times$ 50, 100×100, 150×150, 200×200, 250×250, 300×300, 350×350 and 400×400 pixels in Experiment V.

Figure 15. Elapsed time per cloud pixel (in seconds) for mNSPI, WLR, ARRC and VICR for cloud size of 50×50, 100×100, 150×150, 200×200, 250×250, 300×300, 350×350 and 400×400 pixels in Experiment V.

Table 3. Elapsed time in seconds for different methods in Experiment III. The cloud pixels are 93312, 107529, 89813, and 103,174 in Jianghan Plain, Lower Gwydir Catchment, Beijing and Yellow River Delta, respectively.

Download CSV Display Table

5.5 Uncertainties and implications for future research

In practical applications of VICR, there are some factors that lead to uncertainties in the cloud removal results. First, VICR still requires reference images to be cloud free within a cloud region. When the cloud region is too large, it may be difficult to select sufficient reference images. This problem can be addressed by time-series cloud removal strategy in Experiment VI. The images with low cloud cover are given higher priority for cloud removal, and these reconstructed images can be used as reference images to remove large cloud. One concern is that the errors produced in the images with small cloud cover may magnify the errors for cloud removal in the image with large cloud. To address this concern, we further conducted an experiment. We simulated a large cloud on the target images in . The shape of the cloud was taken from the real images in for each site. We removed the cloud following the same procedure in Experiment III using VICR (we call single image reconstruction). We also put the cloud-simulated image into the time-series Landsat images and reconstruct each image in order of the cloud cover percentage (same as time-series cloud removal procedure in Experiment VI). The time-series Landsat images are listed in Table S1 and the reconstructed images are shown in Figure S11. shows that the RMSEs of the reconstructed images from the time-series cloud removal procedure are comparable or even lower than those from the single image reconstruction. This is because the cloud-removed images provide better temporally closer information for reconstruction of the image with larger cloud cover. Nonetheless, both results show acceptable RMSEs and good visual effects (Figure S11), indicating the time-series cloud removal strategy is suitable in the extreme cases where satisfactory reference images are hard to be found for the large cloud region.

Figure 16. Rmses of the reconstructed images from VICR for single image cloud removal procedure and time-series image cloud removal procedure at (a) Jianghan Plain, (b) Lower Gwydir Catchment, (c) Beijing and (d) Yellow River Delta.

Second, in this study we only tested VICR on an identical Landsat sensor. Therefore, the shortest time interval from the reference image to the target image was 16 days. We didn’t incorporate Landsat ETM+ images mainly because the ETM+ sensor had missing strips owing to the failure of the scan-line corrector after May 2003. As Landsat 9 OLI-2 sensor has similar spectral characteristics as Landsat 8 OLI and provides images with 8-day offset with Landsat 8, the performances of VICR for cloud removal have great potential for enhancement when images from both sensors are included. Future work may consider incorporating Harmonized Landsat-8 Sentinel-2 surface reflectance (Claverie et al. Citation2018) to generate time-series cloud-free images with high temporal frequency.

6. Conclusion

This paper proposed a new thick cloud removal algorithm, i.e. VICR, for Landsat images based on the concept of virtual image. Six experiments were performed at four sites representing different landscape patterns and land change dynamics, and the proposed VICR was compared to existing mNSPI, WLR and ARRC algorithms. The main findings are summarized as follows:

VICR achieved better reconstruction accuracies than mNSPI, WLR and ARRC. Results on the cloud-simulated images showed that VICR yielded higher CC, lower RMSE and higher SSIM than mNSPI and WLR (e.g. RMSEs of VICR vs. MNSPI: 0.0288 vs. 0.0405, 0.0286 vs. 0.0382, 0.0193 vs. 0.0227, 0.0234 vs. 0.0317 in the NIR bands for four sites, respectively). VICR yielded higher CC and lower RMSE than ARRC and in most cases yielded higher SSIM than ARRC (e.g. RMSE of VICR vs. ARRC: 0.0193 vs. 0.0211, 0.0286 vs. 0.0302 and SSIM of VICR vs. ARRC: 0.9374 vs. 0.9375, 0.8860 vs. 0.8898 in the NIR bands for Beijing and Lower Gwydir Catchment, respectively in the first sub-experiment in 4.3). Experiments on both cloud-simulated image and real cloud-contaminated images showed that VICR generated visually better reconstruction results than the other algorithms. The accuracy improvement was especially obvious for the sites with abrupt land change, i.e. Lower Gwydir Catchment and Yellow River Delta.
VICR is more robust than mNSPI, WLR, and ARRC in terms of the sensitivities to reference image and different cloud sizes. When the time interval between images increased, VICR still achieved the lowest RMSEs (e.g. RMSE: 0.0321, 0.0309, 0.0266 and 0.02496 at NIR band for the four sites in the first sub-experiment in section 4.4). In addition, VICR shows smaller fluctuations in RMSEs with increasing cloud size from 50 × 50 to 400 × 400 pixels.
VICR requires much less computation time than ARRC, and less computation time than mNSPI for irregular-shaped clouds (e.g. 32.8 hours for 5-year images in Yellow River Delta).
The major advantages of VICR are that it allows automatics determination of optimal set of time-series reference images, and it better incorporates temporal change information. The robustness and efficiency of VICR can promote its operational use in large datasets processing.

Credit author statement

Zhanpeng Wang: Methodology, Investigation, Data processing, Software, Writing-Original Draft. Yinghai Ke: Conceptualization, Methodology, Investigation, Financial acquisition, Writing – Original Draft, Writing-Review & Editing. Demin Zhou: Supervision, Financial acquisition. Xiaojuan Li: Investigation, Project administration. Huili Gong: Investigation, Project administration.

Supplemental material

Supplemental Material

Download MS Word (11.7 MB)

Acknowledgments

The work is supported by the National Natural Science Foundation of China [grant number 42071396].

Disclosure statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability statement

The data that support the findings of this study are available upon request by contact with the corresponding author, or accessed through https://earthexplorer.usgs.gov.

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/15481603.2022.2160411

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [42071396].

References

Cao, R., Y. Chen, J. Chen, X. Zhu, and M. Shen. 2020. “Thick Cloud Removal in Landsat Images Based on Autoregression of Landsat Time-Series Data.” Remote Sensing of Environment 249 (November): 112001. doi:10.1016/j.rse.2020.112001.
Google Scholar
Hu, C., L. Huo, Z. Zhang, and P. Tang. 2019. “Automatic Cloud Removal from Multi-Temporal Landsat Collection 1 Data Using Poisson Blending.” In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 1661–26. IEEE. doi:10.1109/IGARSS.2019.8899849.
Google Scholar
Chen, Y., R. Cao, J. Chen, L. Liu, and B. Matsushita. 2021. “A Practical Approach to Reconstruct High-Quality Landsat NDVI Time-Series Data by Gap Filling and the Savitzky–Golay Filter.” Isprs Journal of Photogrammetry and Remote Sensing 180 (October): 174–190. doi:10.1016/j.isprsjprs.2021.08.015.
Google Scholar
Chen, S., X. Chen, X. Chen, J. Chen, X. Cao, M. Shen, W. Yang, and X. Cui. 2018. “A Novel Cloud Removal Method Based on IHOT and the Cloud Trajectories for Landsat Imagery.” Remote Sensing 10 (7). doi:10.3390/rs10071040.
Web of Science ®Google Scholar
Cheng, Q., H. Shen, L. Zhang, Q. Yuan, and C. Zeng. 2014. “Cloud Removal for Remotely Sensed Images by Similar Pixel Replacement Guided with a Spatio-Temporal MRF Model.” Isprs Journal of Photogrammetry and Remote Sensing 92: 54–68. doi:10.1016/j.isprsjprs.2014.02.015.
Web of Science ®Google Scholar
Chen, B., B. Huang, L. Chen, and X. Bing. 2017. “Spatially and Temporally Weighted Regression: A Novel Method to Produce Continuous Cloud-Free Landsat Imagery.” IEEE Transactions on Geoscience and Remote Sensing 55 (1): 27–37. doi:10.1109/TGRS.2016.2580576.
Web of Science ®Google Scholar
Chen B., Y. Ye, C. Tong, J. Deng, K. Wang and Y. Hong. 2022. “A novel big data mining framework for reconstructing large-scale daily MAIAC AOD data across China from 2000 to 2020.“ GIScience & Remote Sensing 59 (1): 670–685. doi:10.1080/15481603.2022.2051382
Web of Science ®Google Scholar
Chen, J., X. Zhu, J. E. Vogelmann, F. Gao, and S. Jin. 2011. “A Simple and Effective Method for Filling Gaps in Landsat ETM+ SLC-Off Images.” Remote Sensing of Environment 115 (4): 1053–1064. doi:10.1016/j.rse.2010.12.010.
Web of Science ®Google Scholar
Claverie, M., J. Ju, J. G. Masek, J. L. Dungan, E. F. Vermote, J.C. Roger, S. V. Skakun, and C. Justice. 2018. “The Harmonized Landsat and Sentinel-2 Surface Reflectance Data Set.” Remote Sensing of Environment 219: 145–161. doi:10.1016/j.rse.2018.09.002.
Web of Science ®Google Scholar
Fang, Y., T. Ren, S. Zhang, Y. Liu, S. Liao, L. Xiaokun, R. Cong, and L. Jianwei. 2021. “Rotation with Oilseed Rape as the Winter Crop Enhances Rice Yield and Improves Soil Indigenous Nutrient Supply.” Soil and Tillage Research 212 (August): 105065. doi:10.1016/j.still.2021.105065.
Google Scholar
Gao, J., Y. Yang, T. Wei, and G. Zhang. 2021. “Sentinel-2 Cloud Removal Considering Ground Changes by Fusing Multitemporal SAR and Optical Images.” Remote Sensing 13 (19): 19. doi:10.3390/rs13193998.
Web of Science ®Google Scholar
Huang, Z., X. Liu, Q. Yang, Y. Meng, L. Zhu, and X. Zou. 2021. “Quantifying the Spatiotemporal Characteristics of Multi-Dimensional Karst Ecosystem Stability with Landsat Time Series in Southwest China.” International Journal of Applied Earth Observation and Geoinformation 104 (December): 102575. doi:10.1016/j.jag.2021.102575.
Google Scholar
Kalkan, K., and M. Derya Maktav. 2018. “A Cloud Removal Algorithm to Generate Cloud and Cloud Shadow Free Images Using Information Cloning.” Journal of the Indian Society of Remote Sensing 46 (8): 1255–1264. doi:10.1007/s12524-018-0806-y.
Web of Science ®Google Scholar
Kokhanovsky, A. A. 2013. “Remote Sensing of Atmospheric Aerosol Using Spaceborne Optical Observations.” Earth-Science Reviews 116 (January): 95–108. doi:10.1016/j.earscirev.2012.10.008.
Google Scholar
Lin, C.H., P.H. Tsai, K.H. Lai, and J.Y. Chen. 2013. “Cloud Removal from Multitemporal Satellite Images Using Information Cloning.” IEEE Transactions on Geoscience and Remote Sensing 51 (1): 232–241. doi:10.1109/tgrs.2012.2197682.
Web of Science ®Google Scholar
Yan, L., and D. P. Roy. 2020. “Spatially and Temporally Complete Landsat Reflectance Time Series Modelling: The Fill-And-Fit Approach.” Remote Sensing of Environment 241 (May): 111718. doi:10.1016/j.rse.2020.111718.
Google Scholar
Liu, K., C. Yin, and J. Im. 2019. “Comparison of Five Spatio-Temporal Satellite Image Fusion Models Over Landscapes with Various Spatial Heterogeneity and Temporal Variation.” Remote Sensing 11 (22): 2612. doi:10.3390/rs11222612.
Web of Science ®Google Scholar
Li, L., R. Shi, L. Zhang, J. Zhang, and W. Gao. 2014. “The Data Fusion of Aerosol Optical Thickness Using Universal Kriging and Stepwise Regression in East China.” In, Vol. 9221, 922112. International Society for Optics and Photonics. doi:10.1117/12.2061764.
Google Scholar
Luo, Y., K. Guan, J. Peng, S. Wang, and Y. Huang. 2020. “STAIR 2.0: A Generic and Automatic Algorithm to Fuse Modis, Landsat, and Sentinel-2 to Generate 10 M, Daily, and Cloud-/gap-Free Surface Reflectance Product.” Remote Sensing 12 (19). doi:10.3390/rs12193209.
Web of Science ®Google Scholar
Malek, S., F. Melgani, Y. Bazi, and N. Alajlan. 2018. “Reconstructing Cloud-Contaminated Multispectral Images with Contextualized Autoencoder Neural Networks.” IEEE Transactions on Geoscience and Remote Sensing 56 (4): 2270–2282. doi:10.1109/TGRS.2017.2777886.
Web of Science ®Google Scholar
Xu, M., F. Deng, S. Jia, X. Jia, and A. J. Plaza. 2022. “Attention Mechanism-Based Generative Adversarial Networks for Cloud Removal in Landsat Images.” Remote sensing of environment 271: 271. doi:10.1016/j.rse.2022.112902.
Web of Science ®Google Scholar
Meraner, A., P. Ebel, X. X. Zhu, and M. Schmitt. 2020. “Cloud Removal in Sentinel-2 Imagery Using a Deep Residual Neural Network and SAR-Optical Data Fusion.” ISPRS J Photogramm Remote Sens 166 (August): 333–346. doi:10.1016/j.isprsjprs.2020.05.013.
PubMedGoogle Scholar
Nijland, W., L. Reshitnyk, and E. Rubidge. 2019. “Satellite Remote Sensing of Canopy-Forming Kelp on a Complex Coastline: A Novel Procedure Using the Landsat Image Archive.” Remote Sensing of Environment 220 (January): 41–50. doi:10.1016/j.rse.2018.10.032.
Google Scholar
Padhee S. K. and S. Dutta. 2019. “Spatio-Temporal Reconstruction of MODIS NDVI by Regional Land Surface Phenology and Harmonic Analysis of Time-Series.“ GIScience & Remote Sensing 56 (8): 1261–1288. doi:10.1080/15481603.2019.1646977
Web of Science ®Google Scholar
Li, Q., J. Cui, W. Jiang, Q. Jiao, L. Gong, J. Zhang, and X. Shen. 2021. “Monitoring of the Fire in Muli County on March 28, 2020, Based on High Temporal-Spatial Resolution Remote Sensing Techniques.” Natural Hazards Research 1 (1): 20–31. doi:10.1016/j.nhres.2021.02.001.
Google Scholar
Wang, Q., Y. Tang, X. Tong, and P. M. Atkinson. 2020. “Virtual Image Pair-Based Spatio-Temporal Fusion.” Remote Sensing of Environment 249: 249. doi:10.1016/j.rse.2020.112009.
Web of Science ®Google Scholar
Shen, H., W. Jingan, Q. Cheng, M. Aihemaiti, C. Zhang, and L. Zhiwei. 2019. “A Spatiotemporal Fusion Based Cloud Removal Method for Remote Sensing Images with Land Cover Changes.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (3): 862–874. doi:10.1109/jstars.2019.2898348.
Web of Science ®Google Scholar
Shen, H., X. Li, Q. Cheng, C. Zeng, G. Yang, L. Huifang, and L. Zhang. 2015. “Missing Information Reconstruction of Remote Sensing Data: A Technical Review.” IEEE Geoscience and Remote Sensing Magazine 3 (3): 61–85. doi:10.1109/mgrs.2015.2441912.
Web of Science ®Google Scholar
Shi, Y., M. Burns, R. J. Ritchie, A. Crossan, and I. R. Kennedy. 2014. “Probabilistic Risk Assessment of Diuron and Prometryn in the Gwydir River Catchment, Australia, with the Input of a Novel Bioassay Based on Algal Growth.” Ecotoxicology and Environmental Safety 106 (August): 213–219. doi:10.1016/j.ecoenv.2014.04.027.
PubMedGoogle Scholar
Siravenha, A. C., D. Sousa, A. Bispo, and E. Pelaes. 2011. “Evaluating Inpainting Methods to the Satellite Images Clouds and Shadows Removing.” In In, 56–65. Springer. doi:10.1007/978-3-642-27183-0_7.
Google Scholar
Vuolo, F., N. Wai-Tim, and C. Atzberger. 2017. “Smoothing and Gap-Filling of High Resolution Multi-Spectral Time Series: Example of Landsat Data.” International Journal of Applied Earth Observation and Geoinformation 57: 202–213. doi:10.1016/j.jag.2016.12.012.
Web of Science ®Google Scholar
Wang, Z., Y. Ke, M. Chen, D. Zhou, L. Zhu, and J. Bai. 2021. “Mapping Coastal Wetlands in the Yellow River Delta, China During 2008–2019: Impacts of Valid Observations, Harmonic Regression, and Critical Months.” International Journal of Remote Sensing 42 (20): 7880–7906. Taylor & Francis. doi:10.1080/01431161.2021.1966852.
Web of Science ®Google Scholar
Du, W., Z. Qin, J. Fan, M. Gao, F. Wang, and B. Abbasi. 2019. “An Efficient Approach to Remove Thick Cloud in VNIR Bands of Multi-Temporal Remote Sensing Images.” Remote Sensing 11 (11): 1284. doi:10.3390/rs11111284.
Web of Science ®Google Scholar
Li, X., L. Wang, Q. Cheng, W. Penghai, W. Gan, and L. Fang. 2019. “Cloud Removal in Remote Sensing Images Using Nonnegative Matrix Factorization and Error Correction.” Isprs Journal of Photogrammetry and Remote Sensing 148: 103–113. doi:10.1016/j.isprsjprs.2018.12.013.
Web of Science ®Google Scholar
Li, X., H. Shen, L. Huifang, and L. Zhang. 2016. “Patch Matching-Based Multitemporal Group Sparse Representation for the Missing Information Reconstruction of Remote-Sensing Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (8): 3629–3641. IEEE. doi:10.1109/JSTARS.2016.2533547.
Web of Science ®Google Scholar
Zeng, C., H. Shen, and L. Zhang. 2013. “Recovering Missing Pixels for Landsat ETM+ SLC-Off Imagery Using Multi-Temporal Regression Analysis and a Regularization Method.” Remote Sensing of Environment 131 (April): 182–194. doi:10.1016/j.rse.2012.12.012.
Web of Science ®Google Scholar
Zhang, Q., Q. Yuan, L. Jie, L. Zhiwei, H. Shen, and L. Zhang. 2020. “Thick Cloud and Cloud Shadow Removal in Multitemporal Imagery Using Progressively Spatio-Temporal Patch Group Deep Learning.” Isprs Journal of Photogrammetry and Remote Sensing 162: 148–160. doi:10.1016/j.isprsjprs.2020.02.008.
Web of Science ®Google Scholar
Zhang, Q., Q. Yuan, C. Zeng, L. Xinghua, and Y. Wei. 2018. “Missing Data Reconstruction in Remote Sensing Image with a Unified Spatial–Temporal–Spectral Deep Convolutional Neural Network.” IEEE Transactions on Geoscience and Remote Sensing 56 (8): 4274–4288. doi:10.1109/tgrs.2018.2810208.
Web of Science ®Google Scholar
Zhang, Q., Q. Yuan, L. Zhiwei, F. Sun, and L. Zhang. 2021. “Combined Deep Prior with Low-Rank Tensor SVD for Thick Cloud Removal in Multitemporal Images.” Isprs Journal of Photogrammetry and Remote Sensing 177: 161–173. doi:10.1016/j.isprsjprs.2021.04.021.
Web of Science ®Google Scholar
Zheng, J., X.Y. Liu, and X. Wang. 2021. “Single Image Cloud Removal Using U-Net and Generative Adversarial Networks.” IEEE Transactions on Geoscience and Remote Sensing 59 (8): 6371–6385. doi:10.1109/tgrs.2020.3027819.
Web of Science ®Google Scholar
Zhou, W., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. “Image Quality Assessment: From Error Visibility to Structural Similarity.” IEEE Transactions on Image Processing 13 (4): 600–612. doi:10.1109/TIP.2003.819861.
PubMed Web of Science ®Google Scholar
Zhou, J., J. Chen, X. Chen, X. Zhu, Y. Qiu, H. Song, Y. Rao, C. Zhang, X. Cao, and X. Cui. 2021. “Sensitivity of Six Typical Spatiotemporal Fusion Methods to Different Influential Factors: A Comparative Study for a Normalized Difference Vegetation Index Time Series Reconstruction.” Remote Sensing of Environment 252. doi:10.1016/j.rse.2020.112130.
Google Scholar
Zhou, Y., S. Wang, T. Wu, L. Feng, W. Wu, J. Luo, X. Zhang and N. Yan. 2022. “For-backward LSTM-based missing data reconstruction for time-series Landsat images.“ GIScience & Remote Sensing 59 (1): 410–430. doi:10.1080/15481603.2022.2031549
Web of Science ®Google Scholar
Zhu, X., F. Cai, J. Tian, and T. Williams. 2018. “Spatiotemporal Fusion of Multisource Remote Sensing Data: Literature Survey, Taxonomy, Principles, Applications, and Future Directions.” Remote Sensing 10 (4). doi:10.3390/rs10040527.
Web of Science ®Google Scholar
Zhu, X., F. Gao, D. Liu, and J. Chen. 2012. “A Modified Neighborhood Similar Pixel Interpolator Approach for Removing Thick Clouds in Landsat Images.” IEEE Geoscience and Remote Sensing Letters 9 (3): 521–525. doi:10.1109/LGRS.2011.2173290.
Web of Science ®Google Scholar
Zhu, Z., J. Zhang, Z. Yang, A. H. Aljaddani, W. B. Cohen, S. Qiu, and C. Zhou. 2020. “Corrigendum to Continuous Monitoring of Land Disturbance Based on Landsat Time Series, Remote Sensing of Environment, 238, (2020), 11116.” Remote Sensing of Environment 244 (July): 111824. doi:10.1016/j.rse.2020.111824.
Google Scholar

Virtual image-based cloud removal for Landsat images

ABSTRACT

1. Introduction

2. Virtual Image-based Cloud Removal (VICR) algorithm

2.1 Construct virtual image within cloud region

2.1.2 Selection of the optimal set of time-series reference images

2.2 Construct TWSD and select similar pixels

2.3 Estimate residual image and recover cloud image

3. Experimental data and evaluation

3.1 Testing sites

3.2 Landsat data and preprocessing

3.3 Assessment metrics

4. Experiment design and results

Table 1. Reference images and their input orders for VICR in Jianghan Plain, Lower Gwydir Catchment, Beijing and Yellow River Delta.

Table 2. Specific information on the six experiments.

4.1 Experiment I: Assessment on the selection of optimal set of reference images

4.1.1 Experiment design

4.1.2 Experiment results

4.2 Experiment II: Assessment on TWSD

4.2.1 Experiment design

4.2.2 Experiment results

4.3 Experiment III: Quantitative evaluation and comparison with other cloud-removal methods

4.3.1 Experiment design

4.3.2 Experiment results

4.4 Experiment IV: Sensitivity to reference images

4.4.1 Experiment design

4.4.2 Experiment results

4.5 Experiment V: Sensitivity to different cloud sizes

4.5.1 Experiment design

4.5.2 Experiment results

4.6 Experiment VI: Performance on real cloud-contaminated images

4.6.1 Experiment design

4.6.2 Experiment results

5. Discussion

5.1 Virtual image construction for optimal reference images selection

5.2 Advantages of TWSD and its potential applications

5.3 Robustness and efficiency of VICR method

5.4 Computation efficiency of VICR method

Table 3. Elapsed time in seconds for different methods in Experiment III. The cloud pixels are 93312, 107529, 89813, and 103,174 in Jianghan Plain, Lower Gwydir Catchment, Beijing and Yellow River Delta, respectively.

6. Conclusion

Credit author statement

Supplemental Material

Acknowledgments

Disclosure statement

Data availability statement

Supplemental data

Additional information

Funding

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature