300
Views
0
CrossRef citations to date
0
Altmetric
Review Article

A survey on underwater coral image segmentation based on deep learning

, , &
Received 26 Nov 2023, Accepted 10 Apr 2024, Published online: 01 May 2024

ABSTRACT

Image-based coral reef survey technologies have revolutionized the monitoring of coral reefs by offering a cost-effective and noninvasive method for collecting data across large spatial scales and extended periods. Among these technologies, underwater videography has emerged as a well-established and reliable tool for remote sensing in coral research. Automatic segmentation of coral images represents a forward-looking and fundamental research area in underwater remote sensing. It aims to address a major challenge that limits traditional in situ underwater coral survey research: the difficulty of automatically generating accurate and reproducible high-resolution maps of the underlying coral reef ecosystems. Understanding recent achievements and their relevance to coral ecology monitoring needs is crucial for future planning. This paper presents a literature review on underwater coral image segmentation, focusing on the deep learning implementation pipeline. Furthermore, we introduce a new densely annotated dataset specifically designed for the semantic segmentation of underwater coral images. We systematically evaluate State-of-the-Art (SOTA) methodologies and novel techniques not previously applied to coral image semantic segmentation using the proposed dataset. We then discuss their feasibility in this context. Our goal for this review is to spark innovative ideas and directions for future research in underwater coral image segmentation and to provide readers with an accessible overview of some of the most significant advancements in this field over the past decade. By accomplishing these objectives, we hope to advance research in underwater coral image segmentation and support the development of effective monitoring and conservation strategies for coral reef ecosystems.

1. Introduction

Coral reefs are a vital part of the marine ecosystem, akin to towering structures in a cityscape, offering essential habitats for a wide variety of marine life, as well as delivering significant economic and cultural benefits to coastal communities worldwide (Asner et al. Citation2022; Bewley et al. Citation2015; Muruga, Siqueira, and Bellwood Citation2024). Unfortunately, coral reefs are significantly threatened by human activities, including overfishing, marine pollution, destructive fishing practices, and coastal development, along with the natural threats of global warming and ocean acidification (Horoszowski-Fridman et al. Citation2024; Hughes et al. Citation2017). The cumulative impact of these pressures has led to widespread degradation of coral reef ecosystems. Currently, the Status of Coral Reefs of the World report indicates a loss of approximately 14% of coral reefs since 2009, with projected losses reaching 70%-90% if global warming exceeds 1.5°C (Benkwitt et al. Citation2023; IPCC Citation2023; Souter et al. Citation2021). To provide early warnings, assess adverse factors, and ensure the resilience of coral reef ecosystems, continuous monitoring of their benthic communities is imperative. Regular remote sensing ecology monitoring of the status of coral reefs is essential for evaluating the adverse effects of stressors and enabling prompt responses (Asner et al. Citation2020; Zhong et al. Citation2023).

Over the past decades, satellite, aerial, and drone photogrammetry and remote sensing have played significant roles in the long-term and large-scale ecology monitoring of coral reefs (Casella et al. Citation2022; Giles et al. Citation2023; Liu et al. Citation2014; Lyons et al. Citation2024). Rich and effective image data serve as a visual gateway into the intricate dynamics of coral reef ecosystems, allowing scientists to intuitively understand their critical growth patterns, spatial relationships, and biodiversity. Compared to traditional direct observations by divers, remote sensing of coral reefs through these sensors has greatly accelerated the task of monitoring of coral spatial structures and distribution maps, improving the efficiency of data collection. However, this method faces several challenges that are difficult to overcome, primarily including adverse weather conditions and image distortion and occlusion caused by water surface reflection or refraction. These constraints collectively attenuate their capacity to furnish coral reefs with data of precision, accuracy, and detailed (Shihavuddin et al. Citation2013). Although ship-based remote sensing techniques can generate valuable information like bathymetry, reef percent cover, and rugosity information at higher spatial resolution (Asner et al. Citation2020; Shihavuddin et al. Citation2013), their use requires a significant trade-off between area coverage and access, especially in nearshore reef habitats. These beautiful coral reef habitats are full of terrifying undersea obstacles for ships. In the past decade, the use of underwater imaging technology for onsite data recording and archiving has become increasingly popular, making research into direct acquisition of coral in situ data via images or videos underwater more prevalent. Moreover, with advancements in image processing methods, an increasing number of marine scientists have begun to use annotated underwater coral data for statistical analyzes of coral habitat coverage and health status (Beijbom et al. Citation2012). In recent years, the rapid development of Unmanned Underwater Vehicle (UUV) capabilities for automated operations underwater has made long-term, low-cost remote sensing observations of coral ecosystems more accessible. This technology represents an underwater remote sensing monitoring method for coral habitats that is high-resolution and cost-effective (Qin et al. Citation2022; Zhang, Grün, and Li Citation2022).

With the acquisition of a vast amount of underwater coral remote sensing data, the traditional method of manually annotating coral images has resulted in a large volume of image data not being timely processed. Moreover, different people and annotation methods can introduce significant variabilities and biases (Curtis et al. Citation2024). A survey reported from the National Oceanic and Atmosphere Administration (NOAA) states that among the millions of underwater coral reef images acquired each year, only 1–2% are subsequently analyzed by experts (Pavoni et al. Citation2021). Meanwhile, valuable undersea scientific observation data from coral reef ecosystems lie dormant in various image libraries we have built, awaiting annotation, classification and mapping. Fortunately, in recent years, with the rapid development of photogrammetric computer vision and machine learning, automatic image segmentation technology has not only become fundamental to image understanding in general computer vision but has also emerged prominently in the task of image interpretation in underwater coral remote sensing monitoring.

1.1. Traditional image segmentation techniques

Image segmentation aims at grouping similar regions or segments of an image under their respective class labels. Traditional machine learning methods have been widely employed in image segmentation tasks, such as Support Vector Machine (SVM) (Wang, Fan, and Wang Citation2021) and K-nearest neighbor (KNN) (Li et al. Citation2023). These algorithms typically rely on pre-designed features like texture, color, and shape. During the training phase, they learn the relationship between these features and their corresponding labels using a dataset. Specifically, the SVM algorithm seeks to find a decision boundary to separate different categories, while the KNN algorithm determines categories by measuring the distance between samples, arguably making it the most simple and easy to understand machine learning algorithm. These approaches are capable of effectively recognizing and classifying underwater coral images (Beijbom et al. Citation2012, Citation2015; Shihavuddin et al. Citation2013). Although prior studies employing these algorithms have demonstrated commendable accuracy, their applicability across diverse classes or datasets remains a challenge. This challenge arises from the prerequisite of pre-designed specific features to better describe the target class. Given that each class and dataset possesses distinct salient features, adapting these algorithms to novel contexts proves intricate. Previous research has shown that traditional machine learning models have more classification advantages on small-scale datasets, while deep learning models exhibit better capabilities on large-scale data and recognition accuracy (Wang, Fan, and Wang Citation2021).

1.2. Deep learning-based techniques

The emergence and widespread application of deep neural networks have revolutionized traditional approaches in machine learning paradigms, significantly enhancing the capability to process complex image data and improving overall segmentation system performance. Hence, the advances in deep learning have provided researchers in a variety of fields with sufficient cause to reexamine traditional methods for image segmentation and to determine if deep learning approaches can indeed improve performance. One such field is coral reef ecology, where many approaches to assessing the ecological state of coral reef ecosystems entail the analysis of underwater image data on the spatial distribution of benthos. This data is commonly obtained from underwater images acquired in situ by human divers or by autonomous or remotely operated UUVs. Deep learning-based methods excel in automatically learning features necessary for segmentation. Through extensive research and practical applications, an increasing number of deep learning-based algorithms now contribute to the efficient identification of objects in coral reefs, improving the accuracy of image segmentation for underwater coral images. These algorithms have shown that modern deep learning architectures are indeed capable of outperforming conventional methods for segmentation in underwater coral reef images (King, Bhandarkar, and Hopkinson Citation2018; Lütjens and Sternberg Citation2021). However, challenges persist in the segmentation of target objects due to substantial morphological variability within and among coral species, coupled with diverse regional species pools. The remarkable strides made in recognizing specific objects, object classes, and scenes on benchmark datasets (e.g. Caltech 101 (Li, Fergus, and Perona Citation2004), Pascal (Everingham et al. Citation2010)) cannot be straightforwardly transferred to address the new challenges of recognition in underwater coral images. Consequently, the development of a systematic pipeline for accurate underwater coral image segmentation remains a complex challenge, demanding simultaneous consideration of processing speed, cost, and accuracy. Convolutional Neural Networks (CNNs) have been a cornerstone in the field of deep learning, with their prominence surging in the 2000s. The iconic event of AlexNet, designed by Alex Krizhevsky and others, winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 with a significant margin showcased the powerful capabilities of CNNs in image recognition tasks (Krizhevsky et al. Citation2012). This milestone marked a turning point, significantly boosting interest and research in deep learning. This was facilitated by advancements in computing power, the availability of large-scale image datasets, and improvements in neural network training techniques (LeCun, Bengio, and Hinton Citation2015). According to published references, commonly employed neural networks in coral image segmentation tasks include Residual Network (ResNet), UNet and DeepLab series models (Pavoni et al. Citation2019, Citation2020; Zhang, Grün, and Li Citation2022). These deep neural networks can extract multi-scale context information from the image through convolution operation and demonstrate better generalization compared to shallow networks with the same parameters, which is exactly needed in the coral feature extraction. In addition to this, the Transformer, a novel neural network architecture based on attention mechanism, has revolutionized in computer vision in recent years, but it has not been introduced in underwater coral image recognition and segmentation (Strudel et al. Citation2021). While coral image classification based on deep learning is a promising trend, it is also limited by key factors such as dataset partitioning, data augmentation, network structure, and generalization ability.

Several articles have reviewed SOTA techniques in deep learning for coral remote sensing (Hedley et al. Citation2016; Mittal, Srivastava, and Jayanth Citation2022; Moniruzzaman et al. Citation2017; Raphael et al. Citation2020). These reviews often cover a wide range of issues, from aerial to underwater perspectives in remote sensing, with a primary focus on satellite or aerial coral remote sensing. While there is a growing number of studies on the classification and segmentation of underwater coral remote sensing images, few reviews capture the latest research advancements and cross-domain developments in image segmentation. To address this gap, this paper will summarize and analyze existing literature and methods in deep learning for underwater coral image segmentation, discussing the potential application of the latest research progress from other fields. It is worth noting that accurately distinguishing corals from other seabed elements in underwater images and obtaining precise semantic information are among the most pressing issues in coral surveys. Therefore, these coral image semantic segmentation methods mainly utilize datasets with annotations to train segmentation networks. This fully supervised approach to coral image semantic segmentation has shown great potential in improving the accuracy of coral reef ecological monitoring and analysis. Consequently, our discussions will mainly focus on fully supervised segmentation methods, providing a comprehensive review of current deep learning-based methods for underwater coral image segmentation. We will cover the key steps required for implementing deep learning-based underwater coral image segmentation methods, including data annotation, data augmentation, data partitioning, neural network models, loss functions, and evaluation metrics. illustrates the improvements in these steps. We will conduct comprehensive custom experiments to validate representative or newly advanced methods, with the goal of assisting coral ecologists and spatial information remote sensing scientists engaged in this field, thereby promoting research innovation.

Figure 1. The development trend of coral image segmentation. Data annotation: This process involves marking different regions of an image with labels that represent specific categories or objects. Data partitioning: This step includes not only dividing the dataset into training, validation, and test sets but also segmenting large images into smaller patches suitable for network input. Data augmentation: This technique is used to increase the diversity of your dataset by applying a series of transformations to the original image data. Multi-model data preparation: This involves integrating various types of data sources or modalities to improve the segmentation performance of deep learning models. Network design and optimization: These processes involve selecting and refining the architecture of the neural network to improve its performance in distinguishing between different segments of an image. Loss function selection: This is a critical step that can significantly influence the effectiveness and efficiency of a deep learning model in image segmentation tasks.

Figure 1. The development trend of coral image segmentation. Data annotation: This process involves marking different regions of an image with labels that represent specific categories or objects. Data partitioning: This step includes not only dividing the dataset into training, validation, and test sets but also segmenting large images into smaller patches suitable for network input. Data augmentation: This technique is used to increase the diversity of your dataset by applying a series of transformations to the original image data. Multi-model data preparation: This involves integrating various types of data sources or modalities to improve the segmentation performance of deep learning models. Network design and optimization: These processes involve selecting and refining the architecture of the neural network to improve its performance in distinguishing between different segments of an image. Loss function selection: This is a critical step that can significantly influence the effectiveness and efficiency of a deep learning model in image segmentation tasks.

2. Background

As previously mentioned, deep learning is considered one of the most recent and astonishing advancements in the field of machine learning. Its ability to autonomously learn complex features from large datasets has surpassed traditional machine learning methods, particularly in areas such as image segmentation and recognition. This has made it the leading technology for understanding and processing images. In underwater coral image segmentation, using annotated training images of underwater corals as input for deep learning image segmentation models allows for the extraction of features from various coral objects, ultimately enabling the automatic segmentation of coral images to obtain the semantic maps of coral habitats we need. This has become a crucial task in image-based coral reef ecological monitoring. Currently, based on a review of the literature in this field, this paper categorizes the deep learning-based supervised neural network segmentation methods for underwater coral images into two groups: patch classification based on random point annotation and semantic segmentation based on pixel-level labels. By delineating differences in input data and network output, the segmentation methods are further refined into three distinct types: patch-based classification, superpixel-based segmentation and software-labeled segmentation. Notably, superpixel-based segmentation and software-labeled segmentation are both identified as forms of semantic segmentation based on pixel-level labels, as illustrated in .

Figure 2. The process diagram of (a) patch-based classification, (b) superpixel-based segmentation, and (c) software-labeled segmentation.

Figure 2. The process diagram of (a) patch-based classification, (b) superpixel-based segmentation, and (c) software-labeled segmentation.

2.1. Patch classification based on random point annotation

This method involves segmenting underwater coral images into multiple small blocks or “patches” and classifying them based on annotations of random points within these patches. Each patch is assigned a category label based on the features it contains, and these patches can then be further processed to produce a classification outcome for the entire image. The research community refers to patch classification based on random point annotation as “local image segmentation,” where low-resolution patches can be assigned class labels, then color-coded and combined to create classification masks. These methods are generally based on CNN architectures, widely used for various computer vision tasks, including convolution layer, pooling layer, full connection layer and output layer. In the context of underwater coral image segmentation, patch classification based on random point annotation typically involves extracting non-overlapping cropped patches from high-resolution images as inputs to CNN models. Studies have shown that considering various patch sizes during training, rather than relying solely on a fixed scale for input data (Modasshir, Li, and Rekleitis Citation2018). The final classification target in CNNs can be either an image patch or the entire image, predicting each patch’s class label or assign a class to the entire image based on majority vote among patch labels (Roy et al. Citation2019). Similarly, images can be subdivided into overlapping patches for classification, subsequently assembling them into the probability heatmap of the entire image, highlighting potential target areas (Lam et al. Citation2018; Wang et al. Citation2016).

In the field of coral image segmentation, there have been the application of various CNN architectures, such as Inception V1 and Inception V3, have been applied in the field of coral image segmentation. These architectures are often combined with a set of CNNs (Lu et al. Citation2015) or a two-stage framework (Yan et al. Citation2016) to extract features from multiple image patches. Additionally, researchers have explored novel convolution operation, where random patches extracted from the image serve as the convolution kernels to generate feature maps with input data (Xu et al. Citation2018). Although CNNs are the most common and effective automatic feature extractors in image segmentation tasks, other automatic feature extractors include Stacked Sparse Autoencoder (SSAE) by Shan and Li (Citation2016), Deep Belief Network (DBN) (Hirra et al. Citation2021), and Long Short-Term Memory (LSTM) (Bunk et al. Citation2017), all of which can be used for image classification or segmentation. To capture spatial relationships between patches and restore global context, some networks have incorporated attention mechanisms (Wang et al. Citation2016). In the context of neural networks, the attention mechanism refers to a set of computational processes that enable the model to selectively focus on specific regions or features within the input data. It is worth noting that although the identification of precise boundaries through patch classification based on random point annotation is not the primary goal, the method focuses more on identifying and classifying the main content or objects present in the image (King, Bhandarkar, and Hopkinson Citation2018). Because it facilitates the rapid processing of large amounts of data, especially when annotation resources are limited, it is suitable for the rapid preliminary determination of metrics such as coral cover.

2.2. Semantic segmentation based on pixel-level labels

Unlike patch-based classification, semantic segmentation focuses on classifying each pixel in the image, aiming to segment the image into multiple regions representing different categories (e.g. various types of corals, underwater terrain, etc.). Each pixel is assigned a category label, thereby generating a detailed understanding of the entire image content. Due to the complexity of coral structures, two different label generation approaches are commonly used for pixel-wise coral segmentation, addressing the annotation challenges. The superpixel-based method extends the ground truth labels from annotated points to estimate the main object boundaries in the image, providing a label for each pixel. This concept has been widely applied in medical image segmentation, achieving a coarse-to-fine division of pathological regions through superpixel segmentation (Albayrak and Bilgin Citation2019; Daoud et al. Citation2019; Kopecky et al. Citation2023; Zhang et al. Citation2019). Moreover, some post-processing techniques for optimization have been explored (Xiong et al. Citation2017). Generating pixel-wise labels is relatively complex and typically involves the use of open-source semi-automated annotation tools.

In recent years, semantic segmentation based on pixel-wise annotation has gained popularity in coral image classification, owing to the advancements in Fully Convolutional Neural Networks (FCNNs). FCNN represents a significant advancement by adapting an existing CNN for semantic segmentation tasks. It can process images of any size and replaces fully connected layers with convolutional layers to produce spatial feature maps instead of feature vectors (Long, Shelhamer, and Darrell Citation2015). Besides FCNN architectures, encoder-decoder architectures have also become popular in pixel-wise coral segmentation (Badrinarayanan, Kendall, and Cipolla Citation2017; Ronneberger, Fischer, and Brox Citation2015). This design effectively leverages the spatial information of the image to achieve precise segmentation. Additionally, several methods are proposed to improve the network grasp of global context information, including the use of conditional random field (Chen et al. Citation2014), dilated convolutions (Mahmood et al. Citation2017; Yu and Koltun Citation2015), skip connection (Ronneberger, Fischer, and Brox Citation2015), attention mechanisms (Hu, Shen, and Sun Citation2018), multi-scale processing (Chen et al. Citation2018), and so on. showcases three different approaches to underwater coral image segmentation or classification, clearly illustrating the differences in annotation methods and the significant variation in network prediction results from CNN to FCNN. Specifically, patch-based classification provides a coarse-grained classification of coral types in the image, superpixel-based segmentation offers a more compact and structured representation of the coral reef, and dense-labeled segmentation delivers pixel-level accuracy in identifying coral and other underwater elements. Although semantic segmentation based on pixel-level labels requires precise annotation of each pixel’s category in the image, this method offers a deep understanding of the image content, suitable for applications requiring high-precision analysis of underwater coral reefs. Of course, the choice of method depends on the specific monitoring goals and available resources.

3. Benchmark datasets and challenges in underwater coral image segmentation

3.1. Benchmark datasets

Traditional image segmentation methods rely on predetermined color and texture classifiers to segment underwater coral images, limiting their flexibility and adaptability when dealing with complex environments (King, Bhandarkar, and Hopkinson Citation2018; Shihavuddin et al. Citation2013). In contrast, human do not just rely on visual cues like color and texture for object recognition; they also utilize a wealth of contextual knowledge, a process that traditional methods often fail to mimic. As mentioned earlier, compared to traditional approaches, deep learning techniques can effectively model domain knowledge using datasets of labeled pixels. This means that deep networks can capture more complex image features and patterns, more effectively mimicking human visual perception and handling complex image segmentation tasks with increased accuracy and robustness. However, the success of deep learning methods heavily depends on large, diverse datasets of labeled data. These datasets provide the necessary information to train networks to recognize a variety of image features and objects, making the construction and management of these datasets key to efficient image segmentation. Moreover, for a fair evaluation and comparison of different image segmentation algorithms, standardized datasets representing specific domains are needed. This ensures the validity of comparisons while also fostering collaboration and progress within the research community. This section highlights some of the popular and diverse datasets used for coral image segmentation, which vary in size, annotation methods, and the locations and specific environments from which the data was collected, as shown in .

Table 1. Coral benchmark datasets.

Here is a more detailed introduction and the web addresses where these datasets, listed in , can be accessed.

  • TasCPC (Barrett et al. Citation2011): The Tasmania Coral Point Count (TasCPC) dataset contains of 1,258 benthos images captured by Autonomous Underwater Vehicle (AUV), with each image having 50 random point annotated by experts. The dataset includes 13 types of labels covering biological species (such as types of sponges, corals, algae and others), abiotic elements (such as sand, gravel, rock, shells, etc.), and categories of unknown data, all these labels are indicated in CSV file containing the image meta data. Available at http://marine.acfr.usyd.edu.au/datasets/.

  • MLC (Beijbom et al. Citation2012): The Moorea Labeled Corals (MLC) dataset is a subset of the Moorea Coral Reef Long Term Ecological Research (MCR LTER) project, which has been collecting image data from Moorea Island since 2005. It includes 2,055 images from 2008, 2009, and 2010, collected from three habitats: fringing reef, outer 10 m, and outer 17 m. The dataset features random point annotation (row, column, label) with nine labels, including five coral genera: (1) Acropora, (2) Pavona, (3) Montipora, (4) Pocillopora, and (5) Porites, along with four non-coral labels: (6) Crustose Coralline Algae (CCA), (7) Macroalgae, (8) Sand, and (9) Turf algae. These nine classes account for 96% of the annotations, totaling almost 400,000 points. Available at http://vision.ucsd.edu/data.

  • EILAT (Shihavuddin et al. Citation2013): The EILAT dataset contains 1123 image patches of size 64 × 64, taken with the same camera from coral reefs near Eilat in the Red Sea, and it was named after the location, Eilat. Experts have visually classified these images into eight classes: branches type III, branches type II, branches type I, brain coral, urchin, favid coral, dead coral and sand. Available at https://data.mendeley.com/datasets/86y667257h/2.

  • RSMAS (Shihavuddin et al. Citation2013): The RSMAS dataset consists of 766 image patches of size 256 × 256, collected by divers from the Rosenstiel School of Marine and Atmospheric Sciences at the University of Miami. The images, taken with different cameras at different times and places, and include 14 classes: Acropora cervicornis, Acropora palmata, Colpophyllia natans, Diadema antillarum, Diploria strigosa, Gorgonians, Millepora alcicornis, Montastraea cavernosa, Meandrina meandrites, Montipora spp., Palythoas palythoa, Sponge fungus, Siderastrea siderea and tunicates. Available at https://data.mendeley.com/datasets/86y667257h/2.

  • Benthoz15 (Bewley et al. Citation2015): The Australian benthic dataset contains 407,968 expert-labeled points of 9874 georeferenced images with associated sensor data which captured by AUVs, and up to 50 pixels were randomly annotated in each image. The images were collected from nine sites around Australia over 2008–2013. The dataset features 148 kinds of expert labels, available in a CSV file where each line represents a single labeled point within an image. Available at http://squidle.acfr.usyd.edu.au.

  • EFC (Beijbom et al. Citation2016): The EFC dataset divides the 212 annotated image-pairs into 142 training image-pairs and 70 verification image-pairs. Each image-pair contains a reflectance image and its corresponding fluorescence image, with 200 annotated points in each image. The dataset has nine labels: Acropora, Bare-subst., Faviidae, Millepora, Platygyra, Pocillopora, Other Inv., Other Hard Coral and Unknown. Available at http://10.5061/dryad.t4362.

  • UCSD Mosaics (Edwards et al. Citation2017): UCSD Mosaics is the only publicly available coral dataset that provides dense ground truth annotations. The original dataset contains 16 mosaics with resolution of over 10K × 10K. The version of the dataset provided by Alonso et al. (Citation2019) includes 4193 training images and 729 test images, each with a size of 512 × 512 pixels. The dataset features 34 dense semantic labels in addition to the background class. Available at https://sites.google.com/a/unizar.es/semanticseg/home.

  • SSPQICD (González-Rivero et al. Citation2019): The Seaview Survey Photo-quadrat and Image Classification Dataset encompasses over one million standardized downward-facing “photo-quadrat” images, each covering approximately 1 m2 of the seafloor. These images were collected between 2012 and 2018 at 860 transect locations worldwide, including regions like the Caribbean, Bermuda, Indian Ocean, etc. and also feature multi-temporal images for specific sites. The dataset contains a total of six function groups (Algae, Sponge, Hard Coral, Soft Coral, Other Invertebrates, and Others) and 182 subcategories. It includes 11,383 expert-annotated training images with 50 random points, and 1.1 million images. Available at https://espace.library.uq.edu.au/view/UQ:734799.

  • ATCRC (Rashid and Chennu Citation2020): The dataset comprises 147 underwater scenes of coral reefs near Curaçao, surveyed between August 4th and 26th, 2016. Captured through hyperspectral imaging, each 50-m transect contains approximately 2.29 billion pixels, providing spectral data across 400 wavelengths. Annotations of 31 hyperspectral images, including 23 comparative transects, detail 47 labels for habitat descriptors and biodiversity, focusing on sessile biota, substrate types, and other abiotic elements. Available at https://doi.org/10.3390/data5010019.

3.2. Challenges

In this section, we will explore the key challenges encountered by deep learning algorithms when effectively delineating corals in underwater images. Our exploration encompasses a thorough examination of diverse factors that exert a substantial impact on segmentation accuracy, as outlined below:

3.2.1. Image quality

Due to the complex imaging environment and challenging illumination conditions, effects like color shift, scattering, lens distortion, and chromatic aberration can modify the visual attributes of objects (Fayaz, Parah, and Qureshi Citation2023; Zhou et al. Citation2024). Coral reefs often display blurred outlines and misclassification at varying distances, causing confusion for both human observers and recognition algorithms (Pavoni et al. Citation2021). In , the left image represents a standard, unprocessed underwater image. To address the color defects in underwater images, researchers have conducted extensive studies on restoring their degradation using deep learning techniques (Islam et al. Citation2020a; Li et al. Citation2019; Zhou et al. Citation2023, Citation2024). These approaches aim to enhance the visual quality and improve the suitability of underwater images for subsequent analysis. Some work explores the use of orthophotos for coral segmentation (Pavoni et al. Citation2019, Citation2020). Orthophotos, corrected for geometric distortions, resulting in a map-like representation that fixes the distance from the lens to the object to ensure consistent perspective across all data. However, it is essential to acknowledge that the interpolation process involved in orthophoto generation may lead to local ghosting and blurring. The stitching and mosaicking operations may also result in geometric distortion of the final map. As demonstrated in the right image of , it shows the potential distortion on corals caused by orthographic image processing. In conclusion, image quality significantly impacts the difficulty of identifying and segmenting coral regions, thus affecting the overall accuracy of image segmentation.

Figure 3. Challenges in coral image segmentation: (a) color defect of underwater image and distortion due to orthophoto generation; (b) complex morphology of corals and background in distinguish ability; (c) variations in abundance among coral classes; (d) mislabeling of dead corals as alive by experts (pink: live Pocillopora, light pink: dead Pocillopora, light blue: Porites).

Figure 3. Challenges in coral image segmentation: (a) color defect of underwater image and distortion due to orthophoto generation; (b) complex morphology of corals and background in distinguish ability; (c) variations in abundance among coral classes; (d) mislabeling of dead corals as alive by experts (pink: live Pocillopora, light pink: dead Pocillopora, light blue: Porites).

3.2.2. Complex morphology

Regarding the dense population of coral reefs, it is crucial to account for the complex spatial morphology exhibited by various corals. Coral reef skeletons are hugely variable in morphology, including branching, encrusting, or massive forms. These differences reflect the innate characteristics of different coral species that change with age (Todd Citation2008). Moreover, significant intraspecific variability exists among samples within the same class. Identifying dead corals presents a unique set of challenges, as their surfaces often display distinctive features, such as wormholes, rust spots, or extensive algae coverage. These visual changes are due to higher rates of bioerosion in dead corals compared to their living counterparts (Glynn and Manzello Citation2015). provides a visual representation of the complex morphology observed in coral reefs, further emphasizing the difficulties associated with the segmentation tasks.

3.2.3. Class imbalance

The distribution of corals is closely related to their habitat. Generally, there are significant variations in the abundance of distinct coral functional groups within the same area, often by factors of ten or even hundreds (Bellwood et al. Citation2004). The distribution pattern results in images captured having imbalanced frequencies to varying degrees. The presence of class imbalance has a detrimental effect on classifier output, as it tends to pay of foreground class to varying degrees. The presence of class imbalance detrimentally affects the classifier’s output, as it tends to focus more on the major classes while penalizing the minority ones. As depicted in , notable disparities in coverage among different coral classes. In this area, Porites, marked in light blue, represents an underrepresented class, make it a challenging case for the accurate automatic segmentation due to the scarcity of positive examples.

3.2.4. Human error

Human observers can make recognition errors in recognizing different benthic groups in underwater videos or images (Curtis et al. Citation2024). Interestingly, hard corals, which play a vital role as builders and creators of coral reef ecoysystems, exhibited the least accurate and most variable classification levels (Ninio et al. Citation2003). Furthermore, the subjective annotation of training samples by different experts also introduces an additional factor that may impact the network’s reliability. An example illustrating this effect is presented in , where an expert erroneously labels a dead Pocillopora as alive, potentially leading to misguided network learning. This highlights the substantial influence of human error on the quality of training data and subsequent segmentation results. To address these concerns, establishing robust protocols for expert annotation and ensuring inter-annotator agreement to enhance the overall reliability of the dataset.

4. Data annotation, partitioning, and augmentation

4.1. Data annotation

The process of transforming captured underwater coral images into quantitative data is a resource-intensive task, demanding significant time and labor. In the domain of marine research, there is a growing need for efficient image analysis tools to assist coral class annotation. One such tool is CoralNet Beta, which employs random point annotation, and its standard operating procedures have been comprehensively documented in certain studies (Lamirand et al. Citation2022). In 2021, CoralNet 1.0 was launched as a new iteration of this tool, boasting an impressive 18.4% reduction in annotation error rates (Chen et al. Citation2021a). Other annotation tools based on random point annotation have also been succinctly summarized and organized in relevant literature (Ayroza et al. Citation2015; Gomes-Pereira et al. Citation2016). For simplicity, we provide a straightforward list in . However, to fulfill the demands of semantic segmentation, superpixel techniques can extend point annotations into approximate dense semantic labels (Alonso et al. Citation2017). The selection of sparsity in random sampling for superpixels also exerts a notable influence on segmentation outcomes (Yuval et al. Citation2021). Subsequent research on this tool has yielded several enhancements in both accuracy and processing speed (Alonso et al. Citation2019; Pierce et al. Citation2020). Nevertheless, acquiring precise pixel-wise labels for high-precision tasks, such as fine-grained image segmentation, is still closely linked to human involvement. Various annotation tools for general object segmentation, including Labelbox, are available and are also employed in the semantic segmentation of underwater coral images. Recently, there have been advancements in manual annotation techniques utilizing semi-automatic pixel-wise annotation tools in specific studies (King, Bhandarkar, and Hopkinson Citation2018, Citation2019; Pavoni et al. Citation2020). An exemplary open-source pixel-wise segmentation tool called TagLab (Pavoni et al. Citation2022), meticulously tailored for underwater coral imagery, has emerged in this context. It uses an optimization of DeepLab V3+ to support the analysis of large orthophotos generated through the photogrammetric pipeline. These advancements in image annotation tools and techniques represent progress in the field of underwater coral image segmentation. They enable researchers to more effectively annotate underwater coral images, thereby expediting the analysis process and enhancing the accuracy of subsequent segmentation models. Notably, AI-assisted annotation tools within software, like those found in CoralNet and TagLab, are increasingly utilized for underwater coral image automatic segmentation and analysis (Kopecky et al. Citation2023; Williams et al. Citation2019).

Table 2. Annotation tools used in coral image segmentation.

4.2. Data partitioning

When training a neural network model using data from a dataset, it is essential to consider an optimal strategy. This strategy demands ensuring that the model is provided with an adequate number of training examples during the training process while avoiding overfitting. Simultaneously, one must recognize that insufficient training examples will result in inadequate model training, leading to poor performance during the testing phase. In the case of large datasets, a common practice is hold-out validation. In this approach, the available input data is randomly divided into two parts: a training dataset for model learning and a validation dataset for tuning the network’s hyperparameters. In certain instances, a separate test dataset may also be employed to evaluate the model’s performance and generalization ability. For datasets with a limited number of samples, the prevalent technique is k-fold cross validation. The sample data is partitioned into k folds and k-1 folds for learning and one-fold for validation. The evaluation results are then averaged across k iterations, with k = 5 being a common choice in coral segmentation scenarios (Gómez-Ríos, Tabik, Luengo, Shihavuddin, and Herrera Citation2019; Gómez-Ríos, Tabik, Luengo, Shihavuddin, Krawczyk, et al. Citation2019; Xu et al. Citation2019). Researchers have also explored various techniques to balance the distribution of training and validation datasets. Steffens et al. (Citation2019) created N fixed-size training and validation sets from randomly selected images and applies cosine distances weighted by the overall class-distribution to compare the training/validation distributions. In another study (Mizuno et al. Citation2019), coral regions are segregated into low transparency datasets and high transparency datasets based on coral coverage for coral prediction evaluations. Moreover, when dividing multiple images as a dataset directly from an entire coral map, a biologically-inspired dataset partition method can be employed (Pavoni et al. Citation2019). This method assigns weights to three landscape ecology metrics describing the spatial patterns of benthic communities (measuring the standard deviation, quantity and density of specimens respectively) as similarity scores (S). Finally, the three sub-areas with the best S value are selected as three distinct datasets. These approaches emphasize the significance of appropriate data partitioning and provide valuable insights into the various strategies employed in underwater coral image segmentation studies.

4.3. Data augmentation

Data augmentation serves as a crucial solution to address the challenges posed by limited data in deep learning. It improves the robustness and generalization capabilities of deep learning models by expanding the size and improving the quality of training datasets. underwater coral image segmentation, as an application domain, faces constraints in accessing extensive datasets due to the high costs associated with acquiring high quality images of marine communities in underwater environments. To overcome this limitation, various data augmentation algorithms have been developed, including geometric transformations, colorimetric transformations and Generative Adversarial Network (GAN)-based methods, etc.

Geometric transformations such as flipping (horizontal or vertical), cropping, scaling, rotation, and translation, are commonly used to introduce additional variations in the data and improve model performance. In terms of color correction, previous studies have shown that coral image preprocessing in Lab color space has better performance than RGB, HSV or gray color space (Beijbom et al. Citation2012). A Lab color space is a color-opponent space with dimension L for lightness and a and b for the color-opponent dimensions of redness – greenness and blueness – yellowness, respectively, based on nonlinearly compressed CIE XYZ color space coordinates. Moreover, researchers have discovered that the Cr band in the YCbCr color space effectively mitigates the impact of illumination changes and shadows, making it well-suited for habitat segmentation (Mohamed, Nadaoka, and Nakamura Citation2022). Other color spaces such as CIELAB and YUV have also been employed in various computer vision tasks and could potentially offer benefits for coral image analysis (Poongodi, Hamdi, and Wang Citation2022). In addition to color space conversion, color normalization is one of the most commonly employed data augmentation methods to improve the contrast in submarine coral images (Pavoni et al. Citation2021).

However, before employing these basic data augmentation algorithms, a unique challenge arises in the context of large-scale and small-number high quality underwater coral remote sensing images. This problem mainly involves two aspects: how to efficiently extract as much useful information as possible from a limited number of images, and how to deal with and balance the issue of class distribution imbalance in the dataset. This is particularly relevant for methods using orthomosaic images for image segmentation. The conventional approach involves using a sliding window division to mitigate these challenges. It typically serves as the first step in data extraction, generating a large number of training samples from large-size images, but it may fail to address the problem of class imbalance as it provides equal sampling probability for each target (Shihavuddin et al. Citation2013). There is also a data partition method based on Poisson disk sampling, which can generate non-overlapping random samples through a Dart Throwing algorithm with a class-dependent radius (Pavoni et al. Citation2020). These oversampling strategies leverage the fact that certain corals frequently cover smaller areas, and are techniques used to deal with the issue of class imbalance by increasing the number of samples of the minority classes (either by duplicating existing samples or generating new ones) to balance the distribution of classes in the dataset. Hard negative sampling is a strategy of selecting those negative samples that the model incorrectly predicts (i.e. samples that the model finds difficult to classify correctly) for focused training (Kim, Hong, and Byun Citation2021). This method helps the model better learn to distinguish hard-to-recognize categories or patterns, improving the model’s robustness and accuracy. These two strategies are essentially further processing of samples on the basis of sliding window sampling, as previously mentioned, focusing respectively on addressing the class imbalance and learning difficulties that may arise in datasets generated by sliding window sampling.

To further effectively expand the number of available images, Pavoni et al. (Citation2019) propose an oversampling method based on the actual area coverage of the specimens. A cutting window technique is applied to subdivide each annotated coral into a set of overlapping subparts, the number of which is proportional to the size of the coral. These are added to the dataset to address class imbalance and ensure the representation of both small and large corals during network training. Furthermore, GAN-based data augmentation techniques have proven effective in improving the accuracy of underwater image classification by training a generator network to generate synthetic data with statistical properties similar to the original data (Xu et al. Citation2017). However, while GANs (Goodfellow et al. Citation2014) have been utilized for underwater image segmentation, their application in marine biological data, particularly for such organisms with complex morphologies like corals, remains an area that requires further exploration. In the future, it could be beneficial to explore more deep learning-based oversampling strategies for enhancing underwater coral image samples, such as neural style transfer (Tobin et al. Citation2017; Ye et al. Citation2023) and meta-learning schemes (Gama et al. Citation2023; Zoph and Le Citation2016), which could help generate new samples reflecting underrepresented coral features.

5. Network design and optimization techniques

The intricate structures and growth patterns of corals contribute significantly to the high morphological variation observed within and among populations. This diversity makes it challenging to provide comprehensive descriptions of corals using features from a single type, hierarchy, or scale. Consequently, there is a pressing need to establish generalized characteristics that can effectively describe corals within the context of underwater environments. Given the complexities of coral morphology, it becomes imperative to focus on the design, selection and optimization of network models suitable for underwater coral image segmentation. Moreover, the application of techniques like transfer learning and ensemble methods are also critical. These approaches play a pivotal role in our quest for accurate and robust coral image segmentation, ultimately advancing our understanding of coral ecosystems.

5.1. Convolutional neural network (CNN)

5.1.1. Network design and hierarchical feature extraction

Elawady (Citation2015) first introduces CNN for coral classification task. They adapt a network model based on LeNet-5, with the input layer comprising three basic color channels, along with additional channel for texture and shape descriptors. These descriptors include the Weber local descriptor, phase consistency, and zero component analysis (Song et al. Citation2016) presents a two-level hierarchical framework that includes two-level feature extractors designed to capture global and local features, respectively. Global feature extractor is trained using the entire image, while local feature extractors focused on different parts of the image, such as textures, shapes, and edges. The fusion of features from both levels is input into a linear SVM for the final prediction. Moreover, the combination of features from different convolutional layers of the network can result in a more powerful image representation (Mahmood et al. Citation2020) combines the features extracted from the last three convolutional layers of a pre-trained ResNet to yield higher classification accuracy for coral classification. These combined feature vectors are then input into classifiers such as SVM, Softmax, or a shallow CNN for classification.

5.1.2. Multi-scale and multi-model data input

A common technique in coral classification involves using patches of different scales centered on annotated pixels. This approach has been adopted in numerous coral classification studies (Mahmood et al. Citation2016a, Citation2018; Modasshir, Li, and Rekleitis Citation2018). Among these, the multi-patch network MDNet (depicted in ) proposed by (Modasshir, Li, and Rekleitis Citation2018) not only learns the texture and morphology of corals by cropping patches of different scales but also introduces dense connections between layers to reduce overfitting. Furthermore, leveraging multi-modal data can provide diverse information sources, thus addressing the limitations of single-modal data and enhancing the accuracy and robustness of segmentation models. Beijbom et al. introduce the application of FluorIS (Beijbom et al. Citation2016), a low-cost modified consumer camera, to capture wide-band wide-field-of-view fluorescence images. To obtain high-contrast corals as well as information on non-fluorescing substrates, this approach combines fluorescence and reflectance images. Similarly (Xu et al. Citation2019), utilizes a conditional GAN based image translator trained by paired multi-modal data from the EFC dataset to learn the mapping from reflectance to fluorescence images, generating translated images that provide complementary information. Moreover (Mahmood et al. Citation2016b), utilizes the approach proposed in (Shihavuddin et al. Citation2013) which incorporates Gabor filter responses, Gray Level Co-Occurrence Matrix, and Complete Local Binary Pattern as texture descriptors, along with relative angle and hue color histograms as color descriptors to represent hand-crafted features. These features are then combined with the corresponding CNN features obtained from multi-scale patches. The nViewNet, proposed by (King, Bhandarkar, and Hopkinson Citation2019), introduces feeding images corresponding to each viewpoint into the trained network for classification. Each image casts a vote for the grid face to predict its class. Finally, nViewNet aggregates the votes of all images and outputs the class label with the highest number of votes as the final classification label for that mesh face. The purpose of this approach is to leverage information from multiple viewpoints to improve classification accuracy.

Figure 4. MDNet. (a) The structure of MDNet; (b) the structure of Dense Block.

Figure 4. MDNet. (a) The structure of MDNet; (b) the structure of Dense Block.

5.1.3. Transfer learning and network integration

In addition to network design and data preparation, transfer learning has been employed to enhance the performance of coral image classification. Various backbone networks, including VGG16, ResNet50, ResNet101, Efficientnet-B0, Efficientnet-B1 and Efficientnet-B4, are tested for their suitability in the proposed CoralNet classification engine (Chen et al. Citation2021a). Experimental results show that EffecentNet-B0 achieves the balance of accuracy and speed (Gonzalez-Rivero et al. Citation2020). utilizes the VGG-D 16 convolutional neural network architecture, pretrained on the large ImageNet dataset, and fine-tuned its parameters with a specific coral training dataset (Jackett et al. Citation2023). develops a deep learning system for automatic reef-building coral identification, employing the ResNet50 network for transfer learning. The pre-training fine-tuning method demonstrated superior learning efficiency compared to random initialization and fixed pretraining. Network integration methods have also been explored (Marre et al. Citation2020). extracts the features from global average pooling layers of four ResNet18s and feeds them into a Multilayer Perceptron made of three fully connected layers. In another study (Gómez-Ríos, Tabik, Luengo, Shihavuddin, and Herrera Citation2019), a two-stage classifier consisting of three ResNets is designed to automatically classify corals from texture and structure images. Multiple CNN models, including DenseNet (Huang et al. Citation2017), are fine-tuned using different strategies, and their predictions are combined to achieve higher performance compared to a single CNN model (Lumini, Nanni, and Maguolo Citation2020).

5.2. Fully Convolutional Neural Network (FCNN)

Different from a typical CNN that uses a fully connected layer to output a fixed-length feature vector after the convolutional layer, FCNNs employ a deconvolution layer to upsample the feature map from the last convolutional layer, recovering it to the original size as the input image. Meanwhile, FCNNs offer the advantage of processing images of any size while preserving the original spatial information. In the field of semantic segmentation of underwater coral images, popular neural networks can be categorized into three main types: context-based, deconvolution-based, and feature-enhancement-based methods, which extract semantic features from different perspectives (Hao, Zhou, and Guo Citation2020). Here, we present examples of commonly used networks for each type in coral segmentation. DilatedNet (Yu and Koltun Citation2015), as a context-based approach, introduces a context module that systematically aggregates multi-scale context information by employing dilated convolutions with different dilation rates. This is achieved without introducing extra computational overhead. Both UNet and DeepLab v3+ are feature extraction-based networks that combine shallow and deep features to preserve local and global image details. UNet (Ronneberger, Fischer, and Brox Citation2015) is a symmetric encoder-decoder structure that employs skip connections for feature fusion on channel dimensions. DeepLabv3+ (Chen et al. Citation2018) takes DeepLabv3 (Chen et al. Citation2017) as an encoder and adds a simple yet effective decoder module to refine segmentation results. It integrates the Atrous Spatial Pyramid Pooling module, which enables multi-scale analysis and context aggregation. SegNet (Badrinarayanan, Kendall, and Cipolla Citation2017) has a similar structure to UNet, but it saves the location of pooling in index during the encoding phase and recovers it through unpooling, effectively recovering spatial information. Bayesian SegNet (Kendall, Badrinarayanan, and Cipolla Citation2015) is an extension of SegNet using Bayesian-based models.

5.2.1. Network design and hierarchical feature extraction

In this section, we present various network improvements related to the semantic segmentation of underwater coral images (Mizuno et al. Citation2020). constructs a deep neural network based on UNet and integrates a 50% dropout layer into the first three decoder layers to mitigate overfitting and enhance model generalization (Zhang, Grün, and Li Citation2022) adds a residual refinement module after the UNet network output to improve the prediction boundary of the coral. DeeperLabC, an extension of the DeepLab family, is created to classify coral and non-coral areas in the single-channel images (Song et al. Citation2021). The backbone of DeeperLabC is derived from a fine-tuned pre-trained ResNet34, and a Class Activation Map (CAM) module is incorporated at the end to visualize feature maps for semantic segmentation (Islam et al. Citation2020b) introduces SUIM-Net (as shown in ), a deep residual model following the encoder-decoder architecture, with a core component called Residual Skip Block, which offers optional hierarchical skip connections on its core building block. When skip = 0, skip connection can be fed from an intermediate convolution layer; otherwise from the input for local residual learning.

Figure 5. SUIM-Net. (a) The structure of SUIM-Net; (b) the structure of Residual Skip Block.

Figure 5. SUIM-Net. (a) The structure of SUIM-Net; (b) the structure of Residual Skip Block.

5.2.2. Multi-model data input

Furthermore, leveraging multimodal data has emerged as a promising direction to enhance the prediction accuracy in deep learning-based coral semantic segmentation (King, Bhandarkar, and Hopkinson Citation2019). designs TwinNet, a stereo FCNN architecture utilizing the front end of Dilation8. Left-perspective and right-perspective images are processed independently within the network but share weights. The separated outputs of Dilation8 are fed to different Siamese subnetworks and concatenated on the channel axis before producing the final output. Distance information contained in depth maps is also available (Zhong et al. Citation2023). They effectively capture shape information in depth images by replacing vanilla convolutions with ShapeConv (Cao et al. Citation2021). The incorporation of grayscale information has also shown benefits in coral prediction. In (Thomas et al. Citation2022), they employ two distinct UNet models to handle RGB images and grayscale images separately. By combining outputs of dual-model methodology significantly improves the algorithm’s accuracy by effectively capturing the intricate features present in coral reef images. Even if one model fails, the other model can still continue to work and generate the corresponding masks.

5.2.3. Transfer learning and network integration

In the realm of underwater coral image segmentation based on deep learning, aside from model optimization, techniques such as transfer learning and ensemble networks are also commonly employed. For instance, in their work (Pavoni et al. Citation2019, Citation2020), choose the pre-trained CNN framework Bayesian SegNet and DeepLab v3+ for semantic underwater coral image segmentation task. Furthermore, selectively freezing the lower layer weights of neural networks is a popular fine-tuning method that encourages the network to adapt to the specialized features provided by the coral dataset while preserving the general characteristics of the standard dataset (Jackett et al. Citation2023; Thomas et al. Citation2022). In terms of ensemble methods (Sui et al. Citation2022), presents an automated three-stage parallel deep learning pipeline for processing coral data in turbid water conditions. First, color correction is performed using UGAN. Second, coral areas are distinguished from non-coral areas using UNet based on EfficientNet-B6. Lastly, coral species are classified using DenseNet-201. This ensemble framework for coral reef monitoring significantly reduces the time required for data processing, and parallel ensemble methods also demonstrate the stability in their calibration across dataset shifts (Wyatt et al. Citation2022).

6. Evaluation metrics and loss functions

6.1. Evaluation metrics

To evaluate the validity and effectiveness of segmentation methods, it is important to use standard and widely recognized metrics for a rigorous performance assessment (Garcia-Garcia et al. Citation2017). Here are some commonly used evaluation metrics for image segmentation that can be employed to quantify the performance of segmentation methods:

6.1.1. Pixel accuracy (PA)

It is the simplest evaluation metric, indicating the ratio of correctly classified pixels to all pixels in the image. However, it simply counts the correct pixels, which may not fully reflect the network’s performance, especially in scenarios involving multi-class segmentation.

6.1.2. Mean pixel accuracy (mPA)

Calculates the ratio of correctly classified pixels for each class to the total number of pixels within that class providing an average accuracy measure across all classes.

6.1.3. Precision, recall and f-score

Precision evaluates the model’s ability to correctly distinguish negative samples, while Recall reflects its performance on positive samples. The F-score, which is the harmonic mean of Precision and Recall, offers a balanced assessment of overall accuracy.

6.1.4. Mean intersection over union (mIoU)

It is widely regarded as the standard metric for segmentation tasks. It computes a ratio between the intersection and union of true labels and predictions for each class, and then averages the IoU values across all classes. In most cases, mIoU yields lower values compared to mPA because the denominator of the IoU expression for a class includes pixels that the network has misclassified.

6.1.5. Boundary IoU

It is proposed by Cheng et al. (Citation2021). This metric focuses on evaluating boundaries errors between prediction results and ground truth. It calculates the set of intersection-over-union of mask pixels within a certain distance from the corresponding ground truth or prediction boundary contours. The inclusion of Boundary IoU allows for a more precise assessment of the quality of coral boundaries in segmentation results. While many studies on coral segmentation traditionally rely on metrics like mPA and mIoU for evaluation, recent works have introduced Boundary IoU as an additional evaluation metric to assess the quality of coral boundary delineation in segmentation results (Zhang, Grün, and Li Citation2022).

6.2. Loss functions

The loss function plays a pivotal role during the model training stage by quantifying the disparity between the predicted value and the ground truth. These functions can generally be categorized into four types: distribution-based, region-based, boundary-based, and compounded loss (Jadon Citation2020). In this section, we provide a brief overview of the commonly used loss functions in the context of coral image segmentation.

Distribution-based losses are designed to minimize differences between probability distributions of random variables within a dataset. One of the widely used distribution-based loss functions in coral image segmentation is Cross-Entropy loss. Cross-Entropy loss is favored for its ease of gradient calculation. However, in situations characterized by class imbalance, cross-entropy loss might bias the model toward predicting the majority class while neglecting the minority class. To address this issue, the Weighted Cross-Entropy Loss function is introduced. It assigns greater weight to specific classes, thereby enhancing the model’s learning efficiency for these classes. Another frequently adopted loss function for coral image semantic segmentation is the Focal Loss (Lin et al. Citation2017). By introducing the focal factor into the cross-entropy loss function, this approach successfully mitigates the class-imbalance between the foreground and background, which effectively reduces the weight of generous simple negative samples during training. Dice Loss (Milletari, Navab, and Ahmadi Citation2016), a standard region-based loss function, is well-suited for scenarios marked by severe class imbalance. It demonstrates greater robustness to class imbalances compared to cross-entropy loss. Dice loss incorporates both the ground truth and prediction results during its calculation, making it more sensitive to minority class predictions. Tversky Loss (Salehi, Erdogmus, and Gholipour Citation2017) represents another commonly employed loss function that controls the balance between precision and recall through the adjustment of two additional parameters. Additionally, Boundary Loss (Kervadec et al. Citation2021), as a boundary-based loss, is introduced to address issues associated with region-based loss in highly imbalanced segmentation tasks. In their exploration of various loss functions for model optimization (Pavoni et al. Citation2020), found that Focal Tversky Loss is the most efficient cost function for training coral segmentation models, though it may not be the ideal choice for handling datasets with extreme class imbalances. Consequently, the field often resorts to the use of weighted loss functions or hybrid loss functions to effectively address these challenges (Qin et al. Citation2022; Steffens et al. Citation2019).

7. Experiments and discussion

In this section, we conduct a comparative analysis of representative fully supervised semantic segmentation methods using a same specific dataset. More specifically, the selected competitors include MultiResUNet (Ibtehaz and Rahman Citation2020), DeepLab V3+ (Chen et al. Citation2018), UNet (Ronneberger, Fischer, and Brox Citation2015), TransUNet (Chen et al. Citation2021b), PointRend (Kirillov et al. Citation2020), CCNet (Huang et al. Citation2019), SegFormer (Xie et al. Citation2021), Segmenter (Strudel et al. Citation2021), SwinUNet (Cao et al. Citation2022), LEDNet (Wang et al. Citation2019), BiSeNet (Yu et al. Citation2018), ShelfNet (Zhuang et al. Citation2019) and STDC (Fan et al. Citation2021), covering multiple kinds of approaches discussed in Section 5, as well as state-of-the-art methods that have yet to be applied in the domain of coral semantic segmentation. For these methods, we employ the official models provided by the respective authors for our assessment. It is important to note that the existing published datasets differ in their annotation methods, acquisition scenarios, and data collection conditions, making them often tailored for specific research methods. Therefore, our evaluation was conducted on a custom coral dataset, which represents a significant contribution of this paper. This dataset can be utilized for testing and developing various pixel-level semantic segmentation models. The dataset comprises millimeter-resolution images captured around Moorea Island by the Moore Island Digital Ecosystem Avatar project. Specifically, it includes images of five underwater coral reef plots, covering a total seabed area of about 180 square meters. The images used for training and validation were densely labeled semiautomatically with semantic annotations by experts with the support of TagLab, featuring two main coral categories (Elkhorn coral and Pocillopora) and three health states (living coral, dead coral, and bleached coral). All data were collected during the summer months from 2017 to 2019 using an underwater photography system.

Our experiments are executed on a computer running the Ubuntu operating system, equipped with 20 Intel Core i7-12700F CPUs, 64GB RAM, and two GeForce RTX 3060 Ti GPUs. presents an objective evaluation of performance indicators, including mPA, mIoU, F1 scores. Beyond predicted accuracy, the execution speed is also significant for semantic segmentation algorithms. Therefore, frames per second (FPS) values are also included in . For a more intuitive representation of the results, please refer to .

Figure 6. Scatter plots depict the evaluation metrics, mIoU and FPS, for various network models trained on a customized coral segmentation dataset. Models exhibiting superior performance are positioned closer to the upper-right corner. The model with the highest mIoU is highlighted along the vertical axis, while the model with the highest FPS is highlighted along the horizontal axis. The figure reveals that BiSeNet emerges as the most efficient and top-performing model, showcasing its proximity to the upper-right corner, signifying optimal trade-offs between mIoU and FPS.

Figure 6. Scatter plots depict the evaluation metrics, mIoU and FPS, for various network models trained on a customized coral segmentation dataset. Models exhibiting superior performance are positioned closer to the upper-right corner. The model with the highest mIoU is highlighted along the vertical axis, while the model with the highest FPS is highlighted along the horizontal axis. The figure reveals that BiSeNet emerges as the most efficient and top-performing model, showcasing its proximity to the upper-right corner, signifying optimal trade-offs between mIoU and FPS.

Table 3. The performance of segmentation algorithms on custom dataset.

Examples of images predicted by various methods are available in .

Figure 7. Predicted results comparison on the image patches taken from custom dataset. (a) Denotes the input test image, (b) the ground truth mask. (c–o) Exhibit the outcomes predicted by different network models.

Figure 7. Predicted results comparison on the image patches taken from custom dataset. (a) Denotes the input test image, (b) the ground truth mask. (c–o) Exhibit the outcomes predicted by different network models.

Notably, all experiments maintain consistent testing environments, hyperparameters, and other settings. The objective of these comparative experiments is not merely to depict the present landscape of coral semantic segmentation research but also to aid readers in swiftly and comprehensively selecting a model suitable for their specific task demands. In fact, it is not easy to make a completely fair comparison among these competitors due to the complexity of settings. However, we can make the following conclusions from the comparison results on accuracy, inference speed and visual quality: (i) In data-limited tasks like coral semantic segmentation, a balance of accuracy and efficiency can be achieved by choosing lightweight networks. (ii) Overall, predicted images exhibit noticeable visual disparities from their corresponding ground truth counterparts, particularly at edges and low-contrast regions. For instance, these models frequently generate corals with smooth boundaries. (iii) Compared to FCNNs, Transformers offer inferior predictive performance. SegFormer and Segmenter, except for SwinUNet, suffer from overfitting despite using their smallest network structures. This potentially be attributed to their extensive parameter range and their primary design for natural language processing tasks. (iv) Models that incorporate pretrained weights, such as TransUNet, SwinUNet, and BiseNet, etc., demonstrate superior performance. This highlights the critical role of transfer learning in our study.

8. Current challenges and future directions

The extensive research reviewed in Section 5 indicates that the study of segmentation of underwater coral imagery has garnered increasing attention from researchers as the scientific community gains access to a wealth of underwater imagery data. Efforts have been made to apply a wide array of advancements from the fields of general photogrammetric computer vision and machine learning to large underwater datasets, including those containing coral images, leading to innovative applications and ongoing exploration, with significant progress being achieved. However, as highlighted in the literature review and current status discussions earlier, we recognize that the primary issues to be addressed are related to datasets and image segmentation algorithms. Therefore, in this section, we will further discuss the challenges and promising directions of coral semantic image segmentation, focusing on these two key topics.

8.1. Image datasets

The data for model training may be equally as crucial as the model architecture for the study on coral image semantic segmentation, especially for deep learning-based solutions. In recent years, several universities and research institutions have developed underwater coral imagery datasets, as listed in . These datasets have undoubtedly facilitated progress within the research community in the task of underwater coral image segmentation. However, compared with extensive general-purpose datasets featuring common objects such as buildings, vehicles, and landscapes, the lack of standard underwater coral image datasets for segmentation is still striking. Furthermore, the design of coral datasets is often tailored to specific objectives. For instance, coral images from EILAT dataset comprise low-resolution patches with texture details, yet they fail to reflect the complete coral structure; while UCSD Mosaics dataset presents comprehensive depictions of coral individuals, the absence of underwater imagery processing leads to perplexing color features. Certainly, we have also discovered that the SSPQICD dataset from the University of Queensland contains over a million underwater coral images collected from 860 locations around the world. It boasts more than ten thousand annotated images, most accompanied by geolocation data and subjected to photogrammetric processing, surpassing the quality of most datasets we have reviewed. However, these annotations are limited to sparse labeling, and the actual coverage area of each image is merely one square meter. Additionally, most of the data lacks multi-temporal in-situ observations. The dataset we built in our experiment not only has rigorous underwater geographical references, processed through strict photogrammetry and radiometric calibration, and provides three years of continuous coral observation images and dense annotations at the same location, but its volume is still not large enough, and the types of corals covered are not diverse enough. Considering that data is the foundation for more effective precision segmentation of underwater coral images based on deep learning in the future, we still face challenges related to the difficulty of acquiring larger, more representative, and richer underwater coral image datasets, as well as the cost of annotation. Of course, we also believe that with the deployment of low-cost, more autonomous Unmanned Underwater Vehicles (UUVs) equipped with consumer-grade observation equipment and the emergence of more artificial intelligence-driven annotation tools, in the near future, we may find solutions to these challenges. At the same time, we invite fellow researchers to consider the methodologies and approaches discussed in this review as potential tools in their own research endeavors. Our aim is to contribute to the collective knowledge and encourage the exploration of diverse strategies that have shown promising results in our experience.

8.2. Application-oriented segmentation algorithms

Segmentation of underwater coral images is a crucial tool for the more accurate and fine-grained monitoring of coral health and growth distribution. There is an urgent need for further research and the enhancement of practical application innovations. To begin with, coral images captured in underwater environment are likely to suffer from distinctly different degradations, posing challenges to existing semantic segmentation algorithms. Therefore, it is necessary to develop models that can adapt to ever-changing environments while ensuring robust generalization. In addition, most current coral semantic segmentation methods rely on large models, such as deep networks, consequently demanding extensive computational resources and time for forward inference, as well as substantial parameter storage. However, these resources are often limited in practical applications, with only a handful of international corporations having access to hundreds of thousands of Graphics Processing Units. Non-profit universities and research institutions find it difficult to afford them. Additionally, some application scenarios may not be able to utilize such large-scale computing resources due to constraints on energy and space. For example, designing a lightweight model that can be integrated with UUVs to achieve rapid and accurate image processing without increasing hardware is one of the current major needs for underwater coral ecosystem monitoring. Inspiration for such a lightweight network design can be obtained from . Furthermore, the use of different imaging devices makes the already labor-intensive coral image annotation even more complex. Therefore, developing segmentation models that can operate with limited training data or even in an unsupervised manner is a direction that should be explored.

9. Conclusions

In recent years, due to the ongoing global warming, incidents of coral bleaching and death have become increasingly frequent. Consequently, geospatial information scientists and oceanographers are in growing need of advanced image segmentation tools to extract quantitative data from underwater coral remote sensing images. These tools enable more precise and detailed assessments of coral health and recovery capabilities. It is precisely because this field of research is receiving growing attention that it is undergoing unprecedented development, urgently requiring timely reviews to provide the research community with an in-depth and comprehensive understanding. This paper primarily reviews and critically summarizes the supervised image segmentation algorithms and strategies for underwater coral images based on annotated data, including block-based classification algorithms and pixel-level precise segmentation algorithms. In addition, the paper provides a detailed summary of open-source datasets, advanced data augmentation techniques, and comprehensive accuracy assessment metrics for training and evaluating underwater coral image segmentation models. Our analysis indicates that, despite recent significant progress in underwater coral image segmentation research, there are still numerous challenges mentioned in the paper that require further investigation and resolution. These challenges include the availability of generic datasets for model training and testing, as well as the development of robust deep learning neural network models tailored to applications. Hence, we have also created and made open source a multi-temporal underwater coral dataset, which undergoes rigorous photogrammetric processing and offers densely labeled images that correspond one-to-one. This dataset provides a foundation for the research community to evaluate and develop new algorithms. In conclusion, we hope that these open questions can serve as beacons for our future research directions, guiding the way for promising avenues of study. We believe that, through the joint efforts of geospatial information scientists and oceanographers, with the aid of deep learning technology, we can apply successful experiences from terrestrial ecosystems to rapidly changing underwater coral ecological remote sensing monitoring. This will provide powerful advanced technological means for the sustainable prosperity and protection of coral reef ecosystems in the near future. All abbreviations used in this paper can be found in Appendix .

Acknowledgments

We especially would like to thank Prof. Matthias Troyer for his financial and scientific support provided. Special thanks are extended to the Institute of Geodesy and Photogrammetry and the Institute for Theoretical Physics at ETH Zürich.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is supported by the U.S. National Science Foundation [Grant No.OCE 2224354] and earlier awards for the Moorea Coral Reef LTER, the Italian Minister of University and Research [Grant No.PNRA18 00263-B2], and the National Natural Science Foundation of China [Grant No.41901407].

Notes on contributors

Ming Li

Ming Li received his PhD degree from Wuhan University. He is also a Research Scientist Assistant at ETH Zürich and an Associate research Professor at Wuhan University. His main research interests include the principles and methods of machine learning, photogrammetric computer vision, robotics, underwater photogrammetry, and remote sensing.

Hanqi Zhang

Hanqi Zhang is a master degree student at Wuhan University. Her main interests lie in machine learning and photogrammetric computer vision, particularly focusing on underwater photogrammetry and semantic image segmentation based on deep learning.

Armin Gruen

Armin Gruen received his PhD degree from the Technical University Munich (TUM) and an Honorary Doctorate from Aristotle University. Since 1984, he has been a Full Professor at ETH Zürich, Switzerland. He is a member of the Swiss National Academy of Natural Sciences. He has been awarded the Brock Gold Medal in recognition of outstanding contribution to photogrammetry in 2008, along with a number of other internationally recognized awards and honors. His research interests include 3D cloud mapping and tracking, sensor modeling and image matching of three-line-scanner imagery, digital photogrammetry and remote sensing for Cultural Heritage and natural hazards, satellite and UAV photogrammetry, underwater photogrammetry for coral monitoring, bathymetric Lidar, etc.

Deren Li

Deren Li received his PhD degree from the University of Stuttgart and was awarded an honorary doctorate from ETH Zürich. Since 1986, he has been a Full Professor at Wuhan University, China. He is a member of both the Chinese Academy of Sciences and the Chinese Academy of Engineering. He has been awarded the Brock Gold Medal in recognition of outstanding contribution to photogrammetry in 2020, along with a number of other internationally recognized awards and honors. His research interests include photogrammetry and remote sensing, global navigation satellite system, geographic information system, and their innovation integrations and applications, etc.

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

  • Albayrak, A., and G. Bilgin. 2019. “Automatic Cell Segmentation in Histopathological Images via Two-Staged Superpixel-Based Algorithms.” Medical & Biological Engineering & Computing 57 (3): 653–665. https://doi.org/10.1007/s11517-018-1906-0.
  • Alonso, I., A. Cambra, A. Munoz, T. Treibitz, and A. C. Murillo. 2017. “Coral-Segmentation: Training Dense Labeling Models with Sparse Ground Truth.” Paper presented at the Proceedings of the IEEE international conference on computer vision workshops, Venice, Italy, October 22–29, 2874–2882.
  • Alonso, I., M. Yuval, G. Eyal, T. Treibitz, and A. C. Murillo. 2019. “CoralSeg: Learning Coral Segmentation from Sparse Annotations.” Journal of Field Robotics 36 (8): 1456–1477. https://doi.org/10.1002/rob.21915.
  • Asner, G. P., N. R. Vaughn, C. Balzotti, P. G. Brodrick, and J. Heckler. 2020. “High-Resolution Reef Bathymetry and Coral Habitat Complexity from Airborne Imaging Spectroscopy.” Remote Sensing 12 (2): 310. https://doi.org/10.3390/rs12020310.
  • Asner, G. P., N. R. Vaughn, R. E. Martin, S. A. Foo, J. Heckler, B. J. Neilson, and J. M. Gove. 2022. “Mapped Coral Mortality and Refugia in an Archipelago-Scale Marine Heat Wave.” Proceedings of the National Academy of Sciences 119 (19): e2123331119. https://doi.org/10.1073/pnas.2123331119.
  • Ayroza, C., J. Hustache, L. Fagundes, M. Sumi, and R. Ferrari. 2015. “Finding the Right Tool for Benthic Image Analysis.” Reef Encounters 30 (41): 34–38. https://doi.org/10.53642/OGHN7156.
  • Badrinarayanan, V., A. Kendall, and R. Cipolla. 2017. “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12): 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.
  • Barrett, N. S., L. Meyer, N. A. Hill, and P. H. Walsh. 2011. Methods for the Processing and Scoring of AUV Digital Imagery from South Eastern Tasmania. Institute for Marine and Antarctic Studies Internal Report.
  • Beijbom, O., P. J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman. 2012. “Automated Annotation of Coral Reef Survey Images.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 1170–1177.
  • Beijbom, O., P. J. Edmunds, C. Roelfsema, J. Smith, D. I. Kline, B. P. Neal, and M. J. Dunlap, et al. 2015. “Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation.” Public Library of Science ONE 10 (7): e0130312. https://doi.org/10.1371/journal.pone.0130312.
  • Beijbom, O., T. Treibitz, D. I. Kline, G. Eyal, A. Khen, B. Neal, Y. Loya, B. G. Mitchell, and D. Kriegman. 2016. “Improving Automated Annotation of Benthic Survey Images Using Wide-Band Fluorescence.” Scientific reports 6 (1): 23166. https://doi.org/10.1038/srep23166.
  • Bellwood, D. R., T. P. Hughes, C. Folke, and M. Nyström. 2004. “Confronting the Coral Reef Crisis.” Nature 429 (6994): 827–833. https://doi.org/10.1038/nature02691.
  • Benkwitt, C. E., C. D’ Angelo, R. E. Dunn, R. L. Gunn, S. Healing, M. L. Mardones, J. Wiedenmann, S. K. Wilson, and N. A. J. Graham. 2023. “Seabirds Boost Coral Reef Resilience.” Science Advances 49 (9): 390. https://doi.org/10.1126/sciadv.adj0390.
  • Bewley, M., A. Friedman, R. Ferrari, N. Hill, R. Hovey, N. Barrett, E. M. Marzinelli, et al. 2015. “Australian Sea-Floor Survey Data, with Images and Expert Annotations.” Scientific Data 2 (1): 1–13. https://doi.org/10.1038/sdata.2015.57.
  • Bunk, J., J. H. Bappy, T. M. Mohammed, L. Nataraj, A. Flenner, B. S. Manjunath, S. Chandrasekaran, A. K. Roy-Chowdhury, and L. Peterson. 2017. “Detection and Localization of Image Forgeries Using Resampling Features and Deep Learning.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, July 21–26, 1881–1889.
  • Cao, J., H. Leng, D. Lischinski, D. Cohen-Or, C. Tu, and Y. Li. 2021. “ShapeConv: Shape-Aware Convolutional Layer for Indoor RGB-D Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, Electr Network, Montreal, QC, Canada, October 11–17, 7088–7097.
  • Cao, H., Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang. 2022. “Swin-UNet: Unet-Like Pure Transformer for Medical Image Segmentation.” Paper presented at the Proceedings of the European Conference on Computer Vision, Tel Aviv, October 24–28, 205–218.
  • Casella, E., P. Lewin, M. Ghilardi, A. Rovere, and S. Bejarano. 2022. “Assessing the Relative Accuracy of Coral Heights Reconstructed from Drones and Structure from Motion Photogrammetry on Coral Reefs.” Coral Reefs 41 (4): 869–875. https://doi.org/10.1007/s00338-022-02244-9.
  • Chen, Q., O. Beijbom, S. Chan, J. Bouwmeester, and D. Kriegman. 2021a. “A New Deep Learning Engine for CoralNet.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, October 11–17, 3693–3702.
  • Cheng, B., R. Girshick, P. Dollár, A. C. Berg, and A. Kirillov. 2021. “Boundary IoU: Improving Object-Centric Image Segmentation Evaluation.” Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Electr Network, Virtual, Online, USA, June 19–25, 15334–15342.
  • Chen, J., Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou. 2021b. “TransUnet: Transformers Make Strong Encoders for Medical Image Segmentation.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.2102.04306.
  • Chen, L. C., G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2014. “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFS.” Computer Science 4 (2014): 357–361. https://doi.org/10.48550/arXiv.1412.7062.
  • Chen, L. C., G. Papandreou, F. Schroff, and H. Adam. 2017. “Rethinking Atrous Convolution for Semantic Image Segmentation.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.1706.05587.
  • Chen, L. C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” Paper presented at the Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, September 8–14, 801–818.
  • Curtis, E. J., J. M. Durden, B. J. Bett, V. A. Huvenne, N. Piechaud, J. Walker, J. Albrecht, et al. 2024. “Improving Coral Monitoring by Reducing Variability and Bias in Cover Estimates from Seabed Images.” Progress in Oceanography 222:103214. https://doi.org/10.1016/j.pocean.2024.103214.
  • Daoud, M. I., A. A. Atallah, F. Awwad, M. Al-Najjar, and R. Alazrai. 2019. “Automatic Superpixel-Based Segmentation Method for Breast Ultrasound Images.” Expert Systems with Applications 121:78–96. https://doi.org/10.1016/j.eswa.2018.11.024.
  • Edwards, C. B., Y. Eynaud, G. J. Williams, N. E. Pedersen, B. J. Zgliczynski, A. C. Gleason, J. E. Smith, and S. A. Sandin. 2017. “Large-Area Imaging Reveals Biologically Driven Non-Random Spatial Patterns of Corals at a Remote Reef.” Coral Reefs 36 (4): 1291–1305. https://doi.org/10.1007/s00338-017-1624-3.
  • Elawady, M. 2015. “Sparse Coral Classification Using Deep Convolutional Neural Networks.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.1511.09067.
  • Everingham, M., L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. 2010. “The Pascal Visual Object Classes (Voc) Challenge.” International Journal of Computer Vision 88 (2): 303–338. https://doi.org/10.1007/s11263-009-0275-4.
  • Fan, M., S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei. 2021. “Rethinking BiSenet for Real-Time Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Electr Network, Virtual, Online, USA, June 19–25, 9716–9725.
  • Fayaz, S., S. A. Parah, and G. J. Qureshi. 2023. “Efficient underwater image restoration utilizing modified dark channel prior.” Multimedia Tools Applications 82 (10): 14731–14753. https://doi.org/10.1007/s11042-022-13828-6.
  • Gama, P. H. T., H. Oliveira, J. A. Santos, and R. M. Cesar. 2023. “An Overview on Meta-Learning Approaches for Few-Shot Weakly-Supervised Segmentation.” Computers & Graphics 113:77–88. https://doi.org/10.1016/j.cag.2023.05.009.
  • Garcia-Garcia, A., S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez. 2017. “A Review on Deep Learning Techniques Applied to Semantic Segmentation.” Applied Soft Computing 70:41–65. arxiv preprint arxiv:1704.06857. https://doi.org/10.1016/j.asoc.2018.05.018.
  • Giles, A. B., K. Ren, J. E. Davies, D. Abrego, and B. Kelaher. 2023. “Combining Drones and Deep Learning to Automate Coral Reef Assessment with RGB Imagery.” Remote Sensing 15 (9): 2238. https://doi.org/10.3390/rs15092238.
  • Glynn, P. W., and D. P. Manzello. 2015. “Bioerosion and Coral Reef Growth: A Dynamic Balance.” In Coral Reefs in the Anthropocene, edited by C. Birkeland, 67–97. Dordrecht: Springer. https://doi.org/10.1007/978-94-017-7249-5_4.
  • Gomes-Pereira, J. N., V. Auger, K. Beisiegel, R. Benjamin, M. Bergmann, D. Bowden, and R. S. Santos 2016. “Current and Future Trends in Marine Image Annotation Software.” Progress in Oceanography 149:106–120. https://doi.org/10.1016/j.pocean.2016.07.005.
  • Gómez-Ríos, A., S. Tabik, J. Luengo, A. S. M. Shihavuddin, and F. Herrera. 2019. “Coral Species Identification with Texture or Structure Images Using a Two-Level Classifier Based on Convolutional Neural Networks.” Knowledge-Based Systems 184:104891. https://doi.org/10.1016/j.knosys.2019.104891.
  • Gómez-Ríos, A., S. Tabik, J. Luengo, A. S. M. Shihavuddin, B. Krawczyk, and F. Herrera. 2019. “Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation.” Expert Systems with Applications 118:315–328. https://doi.org/10.1016/j.eswa.2018.10.010.
  • Gonzalez-Rivero, M., O. Beijbom, A. Rodriguez-Ramirez, D. E. Bryant, A. Ganase, Y. Gonzalez-Marrero, and O. Hoegh-Guldberg. 2020. “Monitoring of Coral Reefs Using Artificial Intelligence: A Feasible and Cost-Effective Approach.” Remote Sensing 12 (3): 489. https://doi.org/10.3390/rs12030489.
  • González-Rivero, M., A. Rodriguez-Ramirez, O. Beijbom, P. Dalton, E. V. Kennedy, B. P. Neal, and O. Hoegh-Guldberg. 2019. Seaview Survey Photo-Quadrat and Image Classification Dataset.
  • Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio. 2014. “Generative Adversarial Nets.” Paper presented at the Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, December 8–13. 2672–2680.
  • Hao, S., Y. Zhou, and Y. Guo. 2020. “A Brief Survey on Semantic Segmentation with Deep Learning.” Neurocomputing 406:302–321. https://doi.org/10.1016/j.neucom.2019.11.118.
  • Hedley, J., C. Roelfsema, I. Chollett, A. Harborne, S. Heron, S. Weeks, P. Mumby, et al. 2016. “Remote Sensing of Coral Reefs for Monitoring and Management: A Review.” Remote Sensing 8 (2): 118. https://doi.org/10.3390/rs8020118.
  • Hirra, I., M. Ahmad, A. Hussain, M. U. Ashraf, I. A. Saeed, S. F. Qadri, A. M. Alghamdi, and A. S. Alfakeeh. 2021. “Breast Cancer Classification from Histopathological Images Using Patch-Based Deep Learning Modeling.” Institute of Electrical and Electronics Engineers Access 9:24273–24287.
  • Horoszowski-Fridman, Y. B., I. Izhaki, S. M. Katz, R. Barkan, and B. Rinkevich. 2024. “Shifting Reef Restoration Focus from Coral Survivorship to Biodiversity Using Reef Carpets.” Communications Biology 7 (1): 141. https://doi.org/10.1038/s42003-024-05831-4.
  • Huang, G., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. “Densely Connected Convolutional Networks.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 21–26, 4700–4708.
  • Huang, Z., X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu. 2019. “CCNet: Criss-Cross Attention for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, October 27–November 2, 603–612.
  • Hughes, T. P., M. L. Barnes, D. R. Bellwood, J. E. Cinner, G. S. Cumming, J. B. Jackson, J. Kleypas, et al.2017. “Coral Reefs in the Anthropocene.” Nature 546 (7656): 82–90.
  • Hu, J., L. Shen, and G. Sun. 2018. “Squeeze-And-Excitation Networks.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 18–23, 7132–7141.
  • Ibtehaz, N., and M. S. Rahman. 2020. “MultiResunet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation.” Neural Networks 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025.
  • IPCC. 2023. “Sections.” In Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, Core Writing Team, edited by H. Lee and J. Romero, 35–115. Geneva, Switzerland: IPCC.
  • Islam, M. J., C. Edge, Y. Xiao, P. Luo, M. Mehtaz, C. Morse, S. S. Enan, and J. Sattar. 2020. “Semantic Segmentation of Underwater Imagery: Dataset and Benchmark.” Paper presented at the Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Electr Network, Las Vegas, NV, USA, October 24–January 24, 1769–1776.
  • Islam, M. J., C. Edge, Y. Xiao, P. Luo, M. Mehtaz, C. Morse, and J. Sattar. 2020b. “Semantic Segmentation of Underwater Imagery: Dataset and Benchmark.” Paper presented at the Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, October 24-October 24. 1769–1776.
  • Islam, M. J., Y. Xia, and J. Sattar. 2020a. “Fast Underwater Image Enhancement for Improved Visual Perception.” IEEE Robotics and Automation Letters 5 (2): 3227–3234.
  • Jackett, C., F. Althaus, K. Maguire, M. Farazi, B. Scoulding, C. Untiedt, and A. Williams, et al. 2023. “A Benthic Substrate Classification Method for Seabed Images Using Deep Learning: Application to Management of Deep-sea Coral Reefs.” Journal of Applied Ecology 60 (7): 1254–1273. https://doi.org/10.1111/1365-2664.14408.
  • Jadon, S. 2020. “A Survey of Loss Functions for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), Electr Network, Via del Mar, Chile, October 27–29, 1–7.
  • Kendall, A., V. Badrinarayanan, and R. Cipolla. 2015. “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.1511.02680.
  • Kervadec, H., J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and I. B. Ayed. 2021. “Boundary Loss for Highly Unbalanced Segmentation.” Medical Image Analysis 67:67. https://doi.org/10.1016/j.media.2020.101851.
  • Kim, T., K. Hong, and H. Byun. 2021. “The Feature Generator of Hard Negative Samples for Fine-Grained Image Recognition.” Neurocomputing 439:374–382. https://doi.org/10.1016/j.neucom.2020.10.032.
  • King, A., S. M. Bhandarkar, and B. M. Hopkinson. 2018. “A Comparison of Deep Learning Methods for Semantic Segmentation of Coral Reef Survey Images.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, June 18–22, 1394–1402.
  • King, A., S. M. Bhandarkar, and B. M. Hopkinson. 2019. “Deep Learning for Semantic Segmentation of Coral Reef Images Using Multi-View Information.” Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, June 15–21, 1–10.
  • Kirillov, A., Y. Wu, K. He, and R. Girshick. 2020. “PointRend: Image Segmentation as Rendering.” Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, June 16–20, 9799–9808.
  • Kopecky, K. L., G. Pavoni, E. Nocerino, A. J. Brooks, M. Corsini, F. Menna, and R. J. Schmitt. 2023. “Quantifying the Loss of Coral from a Bleaching Event Using Underwater Photogrammetry and AI-Assisted Image Segmentation.” Remote Sensing 15 (16): 4077. https://doi.org/10.3390/rs15164077.
  • Krizhevsky, A., I. Sutskever, and G. Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Networks.” Advances in Neural Information Processing Systems 25 (2): 1–8.
  • Lamirand, M., P. Lozado-Misa, B. Vargas-Angel, C. Couch, B. Schumacher, and M. Winston. 2022. Analysis of Benthic Survey Images via CoralNet: A Summary of Standard Operating Procedures and Guidelines. Public Version.
  • Lam, C., C. Yu, L. Huang, and D. Rubin. 2018. “Retinal Lesion Detection with Deep Learning Using Image Patches.” Investigative Ophthalmology & Visual Science 59 (1): 590–596. https://doi.org/10.1167/iovs.17-22721.
  • LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep learning.” Nature 521 (7553): 436–444. https://doi.org/10.1038/nature14539.
  • Li, C., S. F. Ding, X. Xu, H. W. Hou, and L. Ding. 2023. “Fast Density Peaks Clustering Algorithm Based on Improved Mutual K-Nearest-Neighbor and Sub-Cluster Merging.” Information Sciences 647:119470. https://doi.org/10.1016/j.ins.2023.119470.
  • Li, F., R. Fergus, and P. Perona 2004. “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, Washington D.C. USA, June 27–July 2, 178–178.
  • Li, C., C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao. 2019. “An underwater image enhancement benchmark dataset and beyond.” IEEE Transactions on Image Processing 29:4376–4389. https://doi.org/10.1109/TIP.2019.2955241.
  • Lin, T. Y., P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. “Focal Loss for Dense Object Detection.” Paper presented at the Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, October 22–29, 2980–2988.
  • Liu, G., S. F. Heron, C. M. Eakin, F. E. Muller-Karger, M. Vega-Rodriguez, L. S. Guild, and S. Lynds. 2014. “Reef-Scale Thermal Stress Monitoring of Coral Ecosystems: New 5-Km Global Products from NOAA Coral Reef Watch.” Remote Sensing 6 (11): 11579–11606. https://doi.org/10.3390/rs61111579.
  • Long, J., E. Shelhamer, and T. Darrell 2015. “Fully Convolutional Networks for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 7–12, 3431–3440.
  • Lu, X., Z. Lin, X. Shen, R. Mech, and J. Z. Wang. 2015. “Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation.” Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 7–12, 990–998.
  • Lumini, A., L. Nanni, and G. Maguolo. 2020. “Deep Learning for Plankton and Coral Classification.” Applied Computing and Informatics 19 (3/4): 265–283. https://doi.org/10.1016/j.aci.2019.11.004.
  • Lütjens, M., and H. Sternberg. 2021. “Deep Learning Based Detection, Segmentation and Counting of Benthic Megafauna in Unconstrained Underwater Environments.” IFAC-Papers Online 54 (16): 76–82. https://doi.org/10.1016/j.ifacol.2021.10.076.
  • Lyons, M. B., N. J. Murray, E. V. Kennedy, E. M. Kovacs, C. Castro-Sanguino, S. R. Phinn, R. B. Acevedo, et al. 2024. “New Global Area Estimates for Coral Reefs from High-Resolution Mapping.” Cell Reports Sustainability 1 (2): 100015. https://doi.org/10.1016/j.crsus.2024.100015.
  • Mahmood, A., M. Bennamoun, S. An, F. A. Sohel, and F. Boussaid. 2020. “ResFeats: Residual Network Based Features for Underwater Image Classification.” Image and Vision Computing 93:103811. https://doi.org/10.1016/j.imavis.2019.09.002.
  • Mahmood, A., M. Bennamoun, S. An, F. A. Sohel, F. Boussaid, R. Hovey, R. B. Fisher, and R. B. Fisher. 2018. “Deep Image Representations for Coral Image Classification.” IEEE Journal of Oceanic Engineering 44 (1): 121–131. https://doi.org/10.1109/JOE.2017.2786878.
  • Mahmood, A., S. A. Bennamoun, F. A. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. B. Fisher. 2016a. “Automatic Annotation of Coral Reefs Using Deep Learning.” Paper presented at the Proceedings of the Oceans 2016 mts/IEEE monterey, Monterey, CA, USA, September 19–23, 1–5.
  • Mahmood, A., S. A. Bennamoun, F. A. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. B. Fisher. 2016b. “Coral Classification with Hybrid Feature Representations.” Paper presented at the Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, September 25–28, 519–523.
  • Mahmood, A., S. A. Bennamoun, F. A. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. B. Fisher. 2017. “Deep Learning for Coral Classification.” Handbook of Neural Computation 383–401. https://doi.org/10.1016/B978-0-12-811318-9.00021-1.
  • Marre, G., C. D. A. Braga, D. Ienco, S. Luque, F. Holon, and J. Deter. 2020. “Deep Convolutional Neural Networks to Monitor Coralligenous Reefs: Operationalizing Biodiversity and Ecological Assessment.” Ecological Informatics 59:101110. https://doi.org/10.1016/j.ecoinf.2020.101110.
  • Milletari, F., N. Navab, and S. A. Ahmadi. 2016. “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.” Paper presented at the Proceedings of the Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, October 25–28, 565–571.
  • Mittal, S., S. Srivastava, and J. P. Jayanth. 2022. “A Survey of Deep Learning Techniques for Underwater Image Classification.” IEEE Transactions on Neural Networks and Learning Systems 34 (10): 6968–6982. https://doi.org/10.1109/TNNLS.2022.3143887.
  • Mizuno, K., K. Terayama, S. Hagino, S. Tabeta, S. Sakamoto, T. Ogawa, and H. Fukami. 2020. “An Efficient Coral Survey Method Based on a Large-Scale 3-D Structure Model Obtained by Speedy Sea Scanner and U-Net Segmentation.” Scientific Reports 10 (1): 12416. https://doi.org/10.1038/s41598-020-69400-5.
  • Mizuno, K., K. Terayama, S. Tabeta, S. Sakamoto, Y. Matsumoto, Y. Sugimoto, and A. Kawakubo, et al. 2019. “Development of an Efficient Coral-Coverage Estimation Method Using a Towed Optical Camera Array System [Speedy Sea Scanner (SSS)] and Deep-Learning-Based Segmentation: A Sea Trial at the Kujuku-Shima Islands.” IEEE Journal of Oceanic Engineering 45 (4): 1386–1395. https://doi.org/10.1109/JOE.2019.2938717.
  • Modasshir, M., A. Q. Li, and I. Rekleitis. 2018. “MDNet: Multi-Patch Dense Network for Coral Classification.” Paper presented OCEANS 2018 MTS/IEEE Charleston, Charleston, SC, USA, October 22–25, 1–6.
  • Mohamed, H., K. Nadaoka, and T. Nakamura. 2022. “Automatic Semantic Segmentation of Benthic Habitats Using Images from Towed Underwater Camera in a Complex Shallow Water Environment.” Remote Sensing 14 (8): 1818. https://doi.org/10.3390/rs14081818.
  • Moniruzzaman, M., S. M. S. Islam, M. Bennamoun, and P. Lavery. 2017. “Deep Learning on Underwater Marine Object Detection: A Survey.” Paper presented at the Proceedings of the Advanced Concepts for Intelligent Vision Systems: 18th International Conference, Antwerp, Belgium, September 18–21, 150–160.
  • Muruga, P., A. C. Siqueira, and D. R. Bellwood. 2024. “Meta-Analysis Reveals Weak Associations Between Reef Fishes and Corals.” Nature Ecology & Evolution 8 (4): 1–10. https://doi.org/10.1038/s41559-024-02334-7.
  • Ninio, R., S. Delean, K. Osborne, and H. Sweatman. 2003. “Estimating Cover of Benthic Organisms from Underwater Video Images: Variability Associated with Multiple Observers.” Marine Ecology Progress Series 265:107–116. https://doi.org/10.3354/meps265107.
  • Pavoni, G., M. Corsini, M. Callieri, G. Fiameni, C. Edwards, and P. Cignoni. 2020. “On Improving the Training of Models for the Semantic Segmentation of Benthic Communities from Orthographic Imagery.” Remote Sensing 12 (18): 3106. https://doi.org/10.3390/rs12183106.
  • Pavoni, G., M. Corsini, M. Callieri, M. Palma, and R. Scopigno. 2019. “Semantic Segmentation of Benthic Communities from Ortho-Mosaic Maps.” International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42:151–158. https://doi.org/10.5194/isprs-archives-XLII-2-W10-151-2019.
  • Pavoni, G., M. Corsini, N. Pedersen, V. Petrovic, and P. Cignoni. 2021. “Challenges in the Deep Learning-Based Semantic Segmentation of Benthic Communities from Ortho-Images.” Applied Geomatics 13 (1): 131–146. https://doi.org/10.1007/s12518-020-00331-6.
  • Pavoni, G., M. Corsini, F. Ponchio, A. Muntoni, C. Edwards, N. Pedersen, and P. Cignoni. 2022. “TagLab: AI-Assisted Annotation for the Fast and Accurate Semantic Segmentation of Coral Reef Orthoimages.” Journal of Field Robotics 39 (3): 246–262. https://doi.org/10.1002/rob.22049.
  • Pierce, J. P., Y. Rzhanov, K. Lowell, and J. A. Dijkstra. 2020. “Reducing Annotation Times: Semantic Segmentation of Coral Reef Survey Images.” Paper presented at the Proceedings of the Global Oceans 2020: Singapore-US Gulf Coast, Biloxi, MS, USA, October 5–30, 1–9.
  • Poongodi, M., M. Hamdi, and H. Wang. 2022. “Image and Audio Caps: Automated Captioning of Background Sounds and Images Using Deep Learning.” Multimedia Systems 29 (5): 1–9. https://doi.org/10.1007/s00530-022-00902-0.
  • Qin, J., M. Li, D. Li, J. Zhong, and K. Yang. 2022. “A Survey on Visual Navigation and Positioning for Autonomous UUVs.” Remote Sensing 14 (15): 3794. https://doi.org/10.3390/rs14153794.
  • Raphael, A., Z. Dubinsky, D. Iluz, and N. S. Netanyahu. 2020. “Neural Network Recognition of Marine Benthos and Corals.” Diversity 12 (1): 29. https://doi.org/10.3390/d12010029.
  • Rashid, A. R., and A. Chennu. 2020. “A Trillion Coral Reef Colors: Deeply Annotated Underwater Hyperspectral Images for Automated Classification and Habitat Mapping.” Data 5 (1): 19. https://doi.org/10.3390/data5010019.
  • Ronneberger, O., P. Fischer, and T. Brox 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Paper presented at the Proceedings of the Medical Image Computing And Computer-assisted Intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 234–241.
  • Roy, K., D. Banik, D. Bhattacharjee, and M. Nasipuri. 2019. “Patch-Based System for Classification of Breast Histology Images Using Deep Learning.” Computerized Medical Imaging and Graphics 71:90–103. https://doi.org/10.1016/j.compmedimag.2018.11.003.
  • Salehi, S. S. M., D. Erdogmus, and A. Gholipour. 2017. “Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks.” Paper presented at the Proceedings of the 8th International Workshop on Machine Learning in Medical Imaging, Quebec City, Canada, September 10, 379–387.
  • Shan, J., and L. Li. 2016. “A Deep Learning Method for Microaneurysm Detection in Fundus Images.” Paper presented at the Proceedings of the 1st IEEE International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Washington, DC, USA, June 27–29, 357–358.
  • Shihavuddin, A. S. M., N. Gracias, R. Garcia, A. C. Gleason, and B. Gintert. 2013. “Image-Based Coral Reef Classification and Thematic Mapping.” Remote Sensing 5 (4): 1809–1841. https://doi.org/10.3390/rs5041809.
  • Song, G. H., X. G. Jin, G. L. Chen, and Y. Nie. 2016. “Two-Level Hierarchical Feature Learning for Image Classification.” Frontiers of Information Technology & Electronic Engineering 17 (9): 897–906. https://doi.org/10.1631/FITEE.1500346.
  • Song, H., S. R. Mehdi, Y. Zhang, Y. Shentu, Q. Wan, W. Wang, and H. Huang. 2021. “Development of Coral Investigation System Based on Semantic Segmentation of Single-Channel Images.” Sensors 21 (5): 1848. https://doi.org/10.3390/s21051848.
  • Souter, D., S. Planes, J. Wicquart, M. Logan, D. Obura, and F. Staub. 2021. “Status of Coral Reefs of the World: 2020: executive summary.” Global Coral Reef Monitoring Network (GCRMN) and International Coral Reef Initiative (ICRI). https://bvearmb.do/handle/123456789/3190.
  • Steffens, A., A. Campello, J. Ravenscroft, A. F. Clark, and H. Hagras. 2019. “Deep Segmentation: Using Deep Convolutional Networks for Coral Reef Pixel-Wise Parsing.” Paper presented at the Proceedings of the Conference and Labs of the Evaluation Forum (CLEF 2019) , Lugano, Switzerland, September 9–12. https://ceur-ws.org/Vol-2380/.
  • Strudel, R., R. Garcia, I. Laptev, and C. Schmid. 2021. “Segmenter: Transformer for Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, Electr Network, Montreal, QC, Canada, October 11–17, 7262–7272.
  • Sui, Y. W., K. X. Ming, M. Meghjani, N. Raghavan, C. Jegourel, and K. Kang. 2022. “An Automated Data Processing Pipeline for Coral Reef Monitoring.” Paper presented at the Proceedings of the OCEANS 2022 Hampton Roads, Hampton Roads, VA, USA, October 17–20, 1–6.
  • Open Source SSRN, Thomas, T., P. Maurya, B. Manikandan, and N. B. Dessai. 2022. “Estimation of Coral Reef Area Through 2d Images: Deep Learning Way Using UNet.” Accessed August 20, 2022. http://papers.ssrn.com/.
  • Tobin, J., R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. 2017. “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” Paper presented at the Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Vancouver, BC, Canada, September 24–28, 23–30.
  • Todd, P. A. 2008. “Morphological Plasticity in Scleractinian Corals.” Biological reviews 83 (3): 315–337. https://doi.org/10.1111/j.1469-185X.2008.00045.x.
  • Wang, P., E. Fan, and P. Wang. 2021. “Comparative Analysis of Image Classification Algorithms Based on Traditional Machine Learning and Deep Learning.” Pattern Recognition Letters 141:61–67. https://doi.org/10.1016/j.patrec.2020.07.042.
  • Wang, D., A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck. 2016. “Deep Learning for Identifying Metastatic Breast Cancer.” https://doi.org/10.48550/arXiv.1606.05718.
  • Wang, Y., Q. Zhou, J. Liu, J. Xiong, G. Gao, X. Wu, and L. J. Latecki 2019. “LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation.” Paper presented at the Proceedings of the IEEE International Conference On Image Processing (ICIP), Taipei, Taiwan, September 22–25, 1860–1864.
  • Williams, I. D., C. S. Couch, O. Beijbom, T. A. Oliver, B. Vargas-Angel, B. D. Schumacher, and R. E. Brainard. 2019. “Leveraging Automated Image Analysis Tools to Transform Our Capacity to Assess Status and Trends of Coral Reefs.” Frontiers in Marine Science 6:222. https://doi.org/10.3389/fmars.2019.00222.
  • Wyatt, M., B. Radford, N. Callow, M. Bennamoun, and S. Hickey. 2022. “Using Ensemble Methods to Improve the Robustness of Deep Learning for Image Classification in Marine Environments.” Methods in Ecology and Evolution 13 (6): 1317–1328. https://doi.org/10.1111/2041-210X.13841.
  • Xie, E., W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. 2021. “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.” Paper presented at the Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, December 06–14, 12077–12090.
  • Xiong, X., L. Duan, L. Liu, H. Tu, P. Yang, D. Wu, and Q. Liu, et al. 2017. “Panicle-SEG: A Robust Image Segmentation Method for Rice Panicles in the Field Based on Deep Learning and Superpixel Optimization.” Plant Methods 13 (1): 1–15. https://doi.org/10.1186/s13007-017-0254-7.
  • Xu, L., M. Bennamoun, F. Boussaid, S. An, and F. Sohel 2019. “Coral Classification Using Densenet and Cross-Modality Transfer Learning.” Paper presented at the Proceedings of the International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, July 14–19, 11–8.
  • Xu, Y., B. Du, F. Zhang, and L. Zhang. 2018. “Hyperspectral Image Classification via a Random Patches Network.” ISPRS Journal of Photogrammetry and Remote Sensing 142:344–357. https://doi.org/10.1016/j.isprsjprs.2018.05.014.
  • Xu, Y., Y. Zhang, H. Wang, and X. Liu. 2017. “Underwater Image Classification Using Deep Convolutional Neural Networks and Data Augmentation.” Paper presented at the Proceedings of the IEEE International Conference On Signal Processing, Communications And Computing (ICSPCC), Xiamen, China, October 22–25, 1–5.
  • Yan, Z., Y. Zhan, Z. Peng, S. Liao, Y. Shinagawa, S. Zhang, and X. S. Zhou, et al. 2016. “Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition.” IEEE Transactions on Medical Imaging 35 (5): 1332–1343. https://doi.org/10.1109/TMI.2016.2524985.
  • Ye, W., C. Liu, Y. Chen, Y. Liu, C. Liu, and H. Zhou. 2023. “Multi-Style Transfer and Fusion of image’s Regions Based on Attention Mechanism and Instance Segmentation.” Signal Processing: Image Communication 110:116871. https://doi.org/10.1016/j.image.2022.116871.
  • Yu, F., and V. Koltun. 2015. “Multi-Scale Context Aggregation by Dilated Convolutions.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.1511.07122.
  • Yuval, M., I. Alonso, G. Eyal, D. Tchernov, Y. Loya, A. C. Murillo, and T. Treibitz. 2021. “Repeatable Semantic Reef-Mapping Through Photogrammetry and Label-Augmentation.” Remote Sensing 13 (4): 659. https://doi.org/10.3390/rs13040659.
  • Yu, C., J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. 2018. “BiSenet: Bilateral Segmentation Network for Real-Time Semantic Segmentation.” Paper presented at the Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, September 8–14, 325–341.
  • Zhang, H., A. Grün, and M. Li. 2022. “Deep Learning for Semantic Segmentation of Coral Images in Underwater Photogrammetry.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2:343–350. https://doi.org/10.5194/isprs-annals-V-2-2022-343-2022.
  • Zhang, Y., L. Yang, H. Zheng, P. Liang, C. Mangold, R. G. Loreto, D. P. Hughes, and D. Z. Chen. 2019. “SPDA: Superpixel-Based Data Augmentation for Biomedical Image Segmentation.” arXiv Preprint arXiv. https://doi.org/10.48550/arXiv.1903.00035.
  • Zhong, J., M. Li, H. Zhang, and J. Qin. 2023. “Fine-Grained 3D Modeling and Semantic Mapping of Coral Reefs Using Photogrammetric Computer Vision and Machine Learning.” Sensors 23 (15): 6753. https://doi.org/10.3390/s23156753.
  • Zhou, J., Q. Liu, Q. Jiang, W. Ren, K. M. Lam, and W. Zhang. 2023. “Underwater Camera: Improving Visual Perception via Adaptive Dark Pixel Prior and Color Correction.” International Journal of Computer Vision 01853. https://doi.org/10.1007/s11263-023-01853-3.
  • Zhou, J., J. Sun, C. Li, Q. Jiang, M. Zhou, K. M. Lam, W. Zhang, and X. Fu. 2024. “HCLR-Net: Hybrid Contrastive Learning Regularization with Locally Randomized Perturbation for Underwater Image Enhancement.” International Journal of Computer Vision 01987. https://doi.org/10.1007/s11263-024-01987-y.
  • Zhuang, J., J. Yang, L. Gu, and N. Dvornek. 2019. “ShelfNet for Fast Semantic Segmentation.” Paper presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, October 27–November 2, 847–856.
  • Zoph, B., and Q. V. Le. 2016. “Neural Architecture Search with Reinforcement Learning.” arXiv preprint arXiv. https://doi.org/10.48550/arXiv.1611.01578.

Appendix

Table A1. Nomenclature.