1,194
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Counting and measuring the size and stomach fullness levels for an intelligent shrimp farming system

, , , , , , & show all
Article: 2268878 | Received 30 May 2023, Accepted 05 Oct 2023, Published online: 18 Oct 2023

Abstract

The penaeid shrimp farming industry is experiencing rapid growth. To reduce costs and labour, automation techniques such as counting and size estimation are increasingly being adopted. Feeding based on the degree of stomach fullness can significantly reduce food waste and water contamination. Therefore, we propose an intelligent shrimp farming system that includes shrimp detection, measurement of approximated shrimp length, shrimp quantity, and two methods for determining the degree of digestive tract fullness. We introduce AR-YOLOv5 (Angular Rotation YOLOv5) in the system to enhance both shrimp growth and the environmental sustainability of shrimp farming. Our experiments were conducted in a real shrimp farming environment. The length and quantity are estimated based on the bounding box, and the level of stomach fullness is approximated using the ratio of the shrimp´s digestive tract to its body size. In terms of detection performance, our proposed method achieves a precision rate of 97.70%, a recall rate of 91.42%, a mean average precision of 94.46%, and an F1-score of 95.42% using AR-YOLOv5. Furthermore, our stomach fullness determined method achieves an accuracy of 88.8%, a precision rate of 91.7%, a recall rate of 90.9%, and an F1-score of 91.3% in real shrimp farming environments.

1. Introduction

Shrimp farming is among the fast-growing aquaculture industries in the world (Fisheries, Citation2011). In recent years, significant advancements have been made in shrimp farming automation, particularly in the development of automatic feeding mechanisms, which play a crucial role in this industry (Prasad et al., Citation2023). The amount of food required during feeding time and duration depends on the factors influencing the rate of ingestion in shrimp (Ge et al., Citation2022). Overfeeding not only leads to food wastage and increased costs but can also result in water contamination from leftover food. On the other hand, underfeeding may cause hindered growth and contribute to cannibalism (Astiyani et al., Citation2022). The mobility or agility of shrimp may also affect their ingestion of food. Shrimp tracking and counting can be used to determine their mobility or agility, as well as detect sick or dead shrimp. Manual measurement of shrimp length for thousands of individuals in a farming pond is impractical. Therefore, automatic counting, size estimation, and assessing the degree of stomach fullness are important for reducing labour and costs.

The digestive tract of shrimp is a straight tube running dorsally and is divided into three regions: the foregut, the midgut, and the hindgut (Davie et al., Citation2015). During feeding, the shrimp’s digestive tract fills up (Loya-Javellana et al., Citation1995), leading to an increase in the stomach repletion index, which represents the degree of stomach fullness (Nunes et al., Citation1997). Shrimps typically digest their food and empty their guts within 2–4 h. The stomach repletion index is used to measure the stomach fullness levels of shrimps, with 0% indicating empty guts and 100% representing fully filled guts (Nunes & Parsons, Citation2000). We utilised infrared cameras to capture the shrimp’s digestive tract for our analysis.

The visibility of an object in an image can be influenced by the distance between the object and the camera, as well as the turbidity of the water environment. Image processing techniques can effectively enhance the image quality and make shrimp guts clearer. Image matting (Weng, Chiu, et al., Citation2020), de-hazing (Liang et al., Citation2021; Weng, Chen, et al., Citation2019), de-scattering (Lu et al., Citation2016), de-blurring (Li et al., Citation2018) algorithms, and foreground-background separation algorithms (Weng, Li, et al., Citation2020) may be applicable in turbid water environments to enhance image quality and improve the accuracy of stomach fullness detection.

In this research, we present an innovative method for automatic counting, size estimation, and measuring the level of stomach fullness. Our method combines multi-angle shrimp detection and the StrongSORT algorithms(Du et al., Citation2023) which are used for object tracking to achieve accurate counting.

The contributions of this research can be summarised as follows:

  1. Replacing the traditional YOLOv5 model with the proposed AR-YOLOv5 model to facilitate the identification of shrimps by calculating the angle and bounding box separately, thereby avoiding the deviation of angle calculation.

  2. We propose a method for automatically detecting shrimp bodies and measuring their lengths, even when multiple shrimps are overlapping. Our method aims to improve the accuracy and efficiency of shrimp detection in various environments.

  3. We propose two methods for automatically estimating the stomach fullness levels of shrimp that are applicable in shrimp farming. These methods have been tested in real farming environments and achieved an accuracy rate of approximately 90%. They effectively address the issue of multiple overlapping shrimps in dense populations and provide improved accuracy while utilising fewer computing resources.

2. Related work

For many years, target detection has been an active research area in the field of computer vision. Its application in shrimp farming has also been explored by researchers, such as Harbitz et al., who analyzed the pixels of shrimp images, segmented the shrimp, and calculated their carapace lengths (Harbitz, Citation2007). Traditional computer vision methods are unable to handle complex image information and capture important features in the images (Ward et al., Citation2021). With the emergence of deep learning, the integration of CNN (convolutional neural network) and computer vision has led to the widespread application of target detection methods in various domains (Redmon et al., Citation2016), including the shrimp farming industry.

Several studies have explored the use of CNN-based object detection models for analyzing aquatic animals. For example, Nguyen et al. created a mask region-based CNN model to segment shrimp larvae and used this model to identify the number of whiteleg shrimp larvae in an image after removing the detected larvae segments (Nguyen et al., Citation2020). Koushik et al. utilised the Faster Region-Based Convolutional Neural Network (Faster R-CNN) to detect shrimp and draw bounding boxes around them (Koushik et al., Citation2021). Subsequently, they employed SVM for classification and segmentation to calculate the length of shrimp. This work achieved an accuracy of 95% and an approximate length value of around 2 centimetres. Similarly, Armalivia et al. proposed to detect shrimp larvae using a YOLOv3-based target detection model and annotated images of collected shrimp larvae to count them (Armalivia et al., Citation2021). However, the shrimp larvae in their images were not densely populated. Further, Zhang et al. used Light-YOLOv4 and local images to detect shrimps in a tank. The images were then segmented into several parts, and the number of shrimps in each part was calculated separately and then added together to get the total number of shrimps (L. Zhang et al., Citation2022). Similarly, Lai et al. used YOLOv4-tiny and an underwater video system to count the number of Pacific white shrimp and evaluate their length (Lai et al., Citation2022). In another study, Wang et al. proposed a method for automated monitoring of Artemia length using UNet (Ronneberger et al., Citation2015) to extract the structure of the length measuring line, and second-order anisotropic Gaussian kernels were used to convert the structure into actual length. Their method accurately measured the length of objects in Artemia images, achieving a mean absolute percentage error of 1.16% (G. Wang et al., Citation2020). Hong Khai et al. improved the Mask-CNN network using a parameter calibration strategy and performed regression fitting to count shrimp based on the density of shrimp in an image, which classified the size into low, medium, and high categories (Hong Khai et al., Citation2022)Citation2022. Other studies have also been conducted on counting various aquatic products (Kandimalla et al., Citation2022; Lainez & Gonzales, Citation2019). Still, these methods lack the ability to detect objects with changes in pose, scale, and orientation, requiring additional image processing techniques to accurately measure the length of shrimps. Moreover, calculating the length that may often have overlapping shrimps poses a significant challenge.

Currently, several rotation-based object detection methods are being applied in various fields, such as Wang et al. proposed a text detection method based on YOLOv4 (X. Wang et al., Citation2021). To achieve rotation-based object detection, they designed a new IoU (intersection over union) algorithm that can approximate the rotation angle, named RDIoU-NMS. Chen et al. proposed a rotated detector for SAR ship detection, which was enhanced by a lightweight non-local attention module (Chen et al., Citation2020). The module was designed to effectively mitigate the problem of background interference. However, previous studies have simply modified the IoU to approximate the rotation angle. In contrast, we utilise a more effective approach by introducing CSL (Circular Smooth Label) (Yang, Hou, et al., Citation2021; Yang & Yan, Citation2020) into our proposed method. This method maps the complex angle calculation from a regression problem to a classification problem and calculates the angle and IoU separately, resulting in a more efficient and accurate algorithm. Still, there is no precedent for applying them in shrimp farming.

There have been few comprehensive studies on the feeding status of shrimp. Awotunde conducted a dissection of shrimp specimens, removing their guts, and quantified the ratio of food weight in the guts to body weight as a measure of fullness. They categorised the shrimp into three groups followed by the ratio: full, half-full, and empty (Awotunde, Citation2021). In contrast, Hashisho et al. utilised computer vision techniques to classify shrimp based on the level of filling in their digestive tracts using shrimp images. They employed segmentation techniques to isolate the shrimp and utilised ResNet10 for classification (Hashisho et al., Citation2021). Although this approach achieved an accuracy of 92\%, it is worth noting that utilising two neural networks for inference in high-density shrimp environments imposes a significant computational burden and requires considerable resources (Lei et al., Citation2021).

In this paper, we present an intelligent shrimp farming system based on AR-YOLOv5. This system enables us to detect and measure shrimp length, count shrimp quantity, and determine their stomach repletion index in an innovative manner. The proposed method effectively tackles common challenges encountered in shrimp farming, including occlusion, scale variations, and complex environments. When integrated with water quality and environmental management systems (Zhang et al., Citation2019), it facilitates fully automated shrimp farming.

3. Methodology

In this study, we proposed AR-YOLOv5, which integrates YOLOv5 and CSL as the core of our system. By utilising the output bounding box generated by AR-YOLOv5, we were able to compute the quantity and the average length, as well as measure the stomach fullness levels of the shrimps.

3.1. Experimental environment

Previous research has predominantly been conducted in small tanks, where a single camera captured the entire environment, but such conditions did not accurately represent typical shrimp farming setups. In our laboratory experiments, our aim was to conduct research on realistic shrimp farming scenarios, which involve the use of large ponds. What we use in this study is a real pond provided by the Department of Biotechnology and Bioindustry Science at National Cheng Kung University, as shown in Figure (b). The dimensions of the pond were as follows: a depth of 100 centimetres, a water depth of 75 centimetres, and a radius of 25 metres. Given the larger size of the pond, a single camera was unable to cover the entire surface area efficiently. To overcome this problem, we strategically positioned multiple underwater cameras in the pond (the number of cameras depending on the size of the pond) to ensure comprehensive sampling of the entire area, as depicted in Figure (a). Furthermore, we utilised downward-facing infrared cameras to capture images of the shrimp. The camera lens was positioned at a short and fixed distance from its base, which was more suitable for our research. In this experiment, we observed young juvenile grass shrimp as our experimental subjects over a duration of four-month period (from August 2022 to December 2022).

Figure 1. Experimental environment. (a) Sketch of the pond. (b) The actual pond.

Figure 1. Experimental environment. (a) Sketch of the pond. (b) The actual pond.

3.2. Overall process of our method

In the pre-processing stage, we first applied the Fast Fourier Transformation (FFT) (Weng, Wang, et al., Citation2019) and the Butterworth high-pass filter (Makandar & Halalli, Citation2015) to the raw infrared image. This was done to balance the brightness appearance while retaining details and minimising noise through low-frequency filtering. Then, the high-frequency components were removed to maintain brightness balance and ensure more accurate recognition of the shrimp's stomach and digestive tract in the subsequent processing steps. The output of the above pre-processes is given to the AR-YOLOv5 model for input.

Next, AR-YOLOv5 identifies each shrimp and generates a bounding box for it. By processing these bounding boxes, we obtain the number of shrimps, their mobility, lengths, and stomach fullness levels. We use StrongSORT to only count the number of shrimps and determine their mobility. On the other hand, AR-YOLOv5 then calculates their lengths based on the number of pixels in the longer side of the bounding box, scales the bounding box to locate the shrimp’s digestive tract, and estimates the stomach fullness level based on the ratio of the digestive tract to the bounding box of its body. The method flowchart is shown in Figure .

Figure 2. Overall process flow of the proposed method.

Figure 2. Overall process flow of the proposed method.

3.3. AR-YOLOv5: the rotation-based object detection

In this section, we present the details of the proposed rotation-based object detection methods implemented in YOLOv5. We called this model AR-YOLOv5, which is an end-to-end object detection model that can effectively detect and classify objects based on their shape and angle. The detected angle bounding box makes it easy to estimate the size of the shrimp, including the width and length in pixels, even when many shrimps are overlapping in the image, which can be translated to actual size.

AR-YOLOv5 model architecture is shown in Figure . The YOLO network can be divided into three parts, namely the backbone, neck, and head. We use CSPDarknet53 (Bochkovskiy et al., Citation2020) model as the backbone, which performs convolution, normalisation, and activation on the input image. The backbone is mainly responsible for extracting the features of the image. We combine SPPF (Spatial Pyramid Pooling Fast) (He et al., Citation2015) and CSP-PAN (Yu et al., Citation2021) modules as the neck, which is responsible for concatenating and merging the features obtained from the backbone, and then passing the merged result into the head to obtain the final output of the model. Finally, the detection part maps the output of the head to anchor boxes, classes, angles, and confidence and displays the predicted boxes, target types, and confidence on the image. Notably, we use the SPPF module instead of the common SPP (Spatial Pyramid Pooling) (Redmon & Farhadi, Citation2018) module because SPPF achieves the same result as SPP with fewer computations, as shown in Figure . Compared to SPP, using SPPF can reduce the number of parameters by half while achieving the same results.

Figure 3. Network architecture of AR-YOLOv5.

Figure 3. Network architecture of AR-YOLOv5.

The training process of the proposed AR-YOLOv5 model is shown in Figure . First, the input image is pre-processed, such as Fourier transformation, and then passed into the YOLO network to obtain the predicted results of the model. We then calculate the CIoU (Complete-IoU) (Zheng et al., Citation2020), confidence, class, and angular loss using the predicted results and the ground truth through the four loss functions. Finally, the loss is backpropagated to update the gradient. It is worth mentioning that YOLOv5 uses methods such as GIoU (Generalized IoU)/CIoU/DIoU (Distance-IoU) to calculate the bounding box’s loss, which is suitable for calculating IoU between horizontal bounding boxes. However, these methods cannot be used for rotated objects, as it is not possible to find the derivative of the IoU between two rotated boxes (Sun et al., Citation2022). To address this issue, we used the CSL algorithm and included a new loss function called Langular, which specifically calculates the angle. Simply put, we calculate CIoU and angle separately. This effectively solves the problem of detecting rotated objects.

Figure 4. Training process of AR-YOLOv5.

Figure 4. Training process of AR-YOLOv5.

Here we proposed the AR-YOLOv5 method for shrimp detection, which incorporates an angular classification prediction layer into the head network of the YOLO framework. As shown in Figure , this approach involves introducing the angular classification loss into the initial loss function. (1) LOSS=LCIoU+Lconfidence +Lclass +Langular(1) where LCIoU, Lconfidence, Lclass and Langular denote the CIoU loss, confidence loss, classification loss, and angular classification loss, respectively.

The CIoU loss is defined as: (2) LCIoU=1IoU+ρ2(b,bgt)c2+αv(2) (3) v=4π2(arctanwgthgtarctanwh)2(3) (4) α=v(1IoU)+v(4)

IoU is a metric that quantifies the overlapping area between the predicted and ground truth bounding boxes. It is defined as the ratio of the intersection and union of the predicted and ground truth bounding boxes. ρ2(b,bgt)is the Euclidean distance between the centres of the two boxes, and c is the diagonal length of the smallest enclosing box that covers both bounding boxes. Moreover, α is a trade-off parameter, and v is a measure of the consistency aspect ratio, (w,h) denote the width and height of the predicted bounding box, respectively. And (wgt,hgt) correspond to the width and height of the ground truth bounding box. Binary cross-entropy (BCE) logits loss is used to calculate Lconfidence, Lclass and Langular, which are all computed in the same manner. (5) Lconfidence =i=0S×SBCE(Pconfidence,Tconfidence)(5) (6) Lclass=i=0S×SBCE(Pclass,Tclass)(6) (7) Langular=i=0S×SBCE(Pθ,Tθ)(7) The predicted offset vector is represented by Pconfidence, while the true vector is represented by Tconfidence. The feature map consists of S×S cells, and the predicted probability distribution of the different classes is denoted by Pclass, while the probability distribution of the ground truth is denoted by Tclass . The angle’s label and prediction are denoted by Pθ and Tθ, respectively, and are coded by CSL, as defined by Equation (9).

BCE is defined as: (8) BCE(P,T)=n=1N[Plog(T)+(1P)log(1T)](8) Here, the value of N determines the number of predictors and anchors in each grid, P and T denote the corresponding predicted and true vectors, respectively. (9) CSL(x)={g(x),θr<x<θ+r0, otherwise (9) We aimed to improve angular classification performance by classifying angles into 180 categories and applying the CSL method. The original label is denoted by x, and we used a window function g(x) to enhance the classification. The window function can take one of four forms: rectangular function, triangle function, Gaussian function, or pulse function. The variable r represents the radius of the window function, and θ denotes the angle of the current bounding box.

The most commonly used methods in rotating object detection are Point-based of a bounding box (x1, y1, x2, y2, x3, y3, x4, y4) and angle regression-based, such as related work in (Parra-Flores et al., Citation2019; Yang, Pan, et al., Citation2021; Yang et al., Citation2020). However, we used CSL as the positive sample label method and the side length definition method as the angle expression method instead of the commonly used angle definition method. During the training phase, the ground truth of a text region set is represented by (x, y, w, h, θ), where the coordinates (x, y) are expressed as the coordinates of the centre point of the anchor box in the image coordinate system, and θ represents the angle class. The long edge is defined as h, and the angle between h and the x-axis is defined as θ, where θ ∈ [−90, 90], as shown in Figure . Using the side length definition combined with CSL, we effectively solve the problem of excessive loss and boundary handling caused by traditional point-based labels and methods that calculate angles using regression. It also allows for easy calculation of new ground-truth values after rotating training images.

Figure 5. The side length definition.

Figure 5. The side length definition.

It is worth noting that accurately estimating the length and width of overlapping shrimps using regular YOLOv5 and segmentation methods is challenging, as illustrated in Figures (a), (b) and (c), respectively. Figures (b) and (c) depict the results of segmentation using the regular method when dealing with overlapping shrimps, revealing that it is difficult to correctly estimate the length. On the other hand, Figure (d) displays a more precise result using our AR-YOLOv5 for each shrimp length in mm. Our method not only accurately detects and separates two overlapping shrimps but also precisely estimates their sizes without additional image segmentation processing.

Figure 6. Comparison of the size estimation methods between regular YOLOv5 and the proposed method in the overlapping shrimps. (a) Regular YOLOv5. (b)(c) Regular YOLOv5 failed to estimate size by segmentation. (d) Our angle detection-based size estimation result.

Figure 6. Comparison of the size estimation methods between regular YOLOv5 and the proposed method in the overlapping shrimps. (a) Regular YOLOv5. (b)(c) Regular YOLOv5 failed to estimate size by segmentation. (d) Our angle detection-based size estimation result.

3.4. Estimating the length of shrimps

Since there are no significant changes in shrimp growth within a day (Robertson et al., Citation1993), we sampled and computed the average size detected by several cameras on that day. It is obtained by dividing the accumulated length by the number of shrimps. We would like to emphasise that in our method, we only consider shrimp that are fully visible within the field of view, i.e. they are enclosed within the bounding boxes, with all four points within the image boundaries. (10) average_length=1nmΣi=1nΣs=1mShrimpLengthis(10) Where average_length is the estimated average length taken by a duration point in time of the cameras in the pond. ShrimpLengthis is the length of a specific shrimp in a single frame, where m is the number of shrimps in a single frame, and n is the total number of frames.

3.5. Counting the number of shrimps

In this study, the shrimp farming video was supported by the Department of Biotechnology and Bioindustry Science at National Cheng Kung University. Due to the large size of the pond, a single camera was insufficient to capture all the shrimp for counting in a single shot. To address this limitation, we utilised multiple underwater infrared cameras developed by National Sun Yat-sen University to sample the captured shrimps under the cameras. However, this approach presented a particular technical challenge, namely, the movement of shrimp in and out of the camera's range. As we focused on detection rather than recognition, it became difficult to determine whether a shrimp re-entering the camera's range was the same individual or a different one. Given the difficulty in accurately identifying individual shrimp, we assumed that each shrimp entering the camera's field of view should be counted as a new individual.

Another issue arose when a shrimp entered the camera’s range, moved within the camera’s view for a period of time, and then suddenly became undetected and re-detected. This caused YOLO to recognise it as a new shrimp. To address this, we employed the StrongSORT tracking algorithm after the AR-YOLOv5 detected a shrimp. This ensured that moving shrimps within the camera’s view were consistently recognised as the same individual. Once a shrimp was detected, it was passed to the StrongSORT algorithm for tracking and counting as the same entity. StrongSORT is a hierarchical matching algorithm where the network takes the output of the object detector (bounding boxes, confidences, and features) as input. The confidences are primarily used to filter out false positives, while the bounding boxes and features are matched with the targets for later tracking. The StrongSORT algorithm utilises the Kalman filter algorithm to generate trackers based on the targets in the previous frame, followed by the utilisation of the Hungarian algorithm for matching.

3.6. Level of stomach fullness

In this section, we present two methods for assessing the degree of digestive tract fullness in shrimps, enabling quick calculation of stomach fullness levels using image processing techniques. By determining the stomach fullness level, shrimp farmers can make informed decisions regarding their feeding strategies. In this experiment, we only classified the stomach fullness levels into two categories: “Full” when the stomach was more than half full and “Empty” when the stomach was less than half full (Awotunde, Citation2021), in order to present our findings. The corresponding images depicting different stomach fullness levels are shown in Table . It is important to highlight that shrimp farmers have the flexibility to establish their own criteria based on their understanding of the shrimp's condition.

Table 1. Degree of stomach fullness mapping to Full/Empty.

3.6.1. Method 1

We first need to locate the stomach of the shrimp and utilise the ratio between the stomach and the size of the entire shrimp to determine the level of stomach fullness. To locate the digestive tract, we scaled the bounding box of the whole shrimp by taking a ratio, up, down, left, and right to outline the boundary box of the stomach, which we call the Inner bounding box, while the bounding box that encompasses the entire shrimp is referred to as the Outer bounding box, as shown in Figure . (11) ratio=1IΣi=1IPi1BΣb=1BPb(11) Equation (11) represents the formula used to calculate the ratio, which was then subsequently normalised and mapped to the stomach fullness levels. The pixel value of pixel i within the Inner bounding box is denoted as Pi, and that within the Outer bounding box is denoted as Pb. Additionally, the total number of pixels within the inner bounding box is represented by I, and within the Outer bounding box by B.

Figure 7. Inner bounding box and Outer bounding box of shrimp were used to calculate the stomach fullness levels.

Figure 7. Inner bounding box and Outer bounding box of shrimp were used to calculate the stomach fullness levels.

The issue with this approach arises when there is more sediment between the Inner bounding box and the Outer bounding box. This amplifies the brightness of the Outer bounding box, leading to a larger margin of error in the judgment.

3.6.2. Method 2

To address the issue with Method 1, we have designed another method for computing stomach fullness levels that exclusively utilises the Inner bounding box. This approach resolves the problem of the Outer bounding box being overly bright and requires less computation compared to Method 1. After the model determines the Inner bounding box, we apply Otsu's thresholding to it (Otsu, Citation1979), as shown in Figure , resulting in a black-and-white only binary image of the shrimp’s digestive tract. We then calculate the ratio of the size of the white pixels to the size of the Inner bounding box in the image to determine the stomach fullness levels, which are expressed as a percentage (multiplied by 100).

Figure 8. Highlight of the shrimp’s digestive tract segmented using Otsu’s thresholding method.

Figure 8. Highlight of the shrimp’s digestive tract segmented using Otsu’s thresholding method.

3.7. Evaluation of model performance

We use common evaluation metrics for object detection, including precision (P), recall (R), F1-score (F1), and mean average precision (mAP) as the criteria for evaluating the performance of our experiment results. Precision is the proportion of correctly predicted samples among all test samples; recall is the proportion of correctly predicted positive samples among all test positive samples; F1 is the metric calculated by combining precision and recall, and mAP is the average value of precisions based on different recalls. They were calculated using Equations (12)–(15). (12) P=TPTP+FP×100\%(12) (13) R=TPTP+FN×100\%(13) (14) F1=2×P×RP+R×100\%(14) (15) mAP=c=1CAP(c)C(15)

The number of correctly predicted positive samples is represented by TP, the number of incorrectly predicted positive samples is represented by FP, the number of incorrectly predicted negative samples is represented by FN, and C represents the number of detection classes. In our experiment, the number of boxes that were correctly identified as shrimp, the number of boxes that were incorrectly identified as shrimp, and the number of boxes that were not identified as shrimp are denoted by TP, FP, and FN, respectively. And since the focus of this study is only on shrimp, C = 1.

4. Experiments and results

4.1. Experimental dataset

Our primary concern is the accuracy and precision of the detection of shrimp’s digestive tract. While there are some existing open-source shrimp datasets containing images and labels, the digestive tract of these shrimp images is not visible due to the type of shrimp (such as adult grass shrimp that has darker shells), the environment (such as deteriorated water or low indoors lighting), and the camera equipment utilised. Therefore, these images cannot be used as our experimental data. To address this issue, we captured a series of shrimp videos at the National Cheng Kung University shrimp farming site using an underwater infrared camera. We randomly selected 370 images from the video sequences recorded over the past two years as our training set and randomly selected another 45 images from the remaining videos as our testing set. All greyscale images have a resolution of 720 × 480 pixels.

4.2. Implementation details

All our experiments were conducted on an Intel i7-8700 3.20 GHz CPU and 16GB of memory with an NVIDIA GTX TITAN X of 12GB of GPU memory. The programme implementation environment uses Python3.8 and PyTorch 1.10.0 on Ubuntu 20.04. Our AR-YOLOv5 uses CSPDarknet53 as the backbone. In the training phase, the number of iterations (epochs) depends on the loss function (when the loss is less than 0.5 for a certain period of time, the training stops), the batch size is set to 8, the weight decay and the momentum of the optimisation function are set to 5×104 and 0.9, respectively. In the testing phase, we set an IoU threshold of 0.1 to discard duplicate detections.

4.3. Model training

In our experiment, we trained multiple versions of AR-YOLOv5 models, namely AR-YOLOv5n, AR-YOLOv5s, AR-YOLOv5m, AR-YOLOv5l, and AR-YOLOv5x, using a dataset consisting of 370 images. These models vary in terms of the depth (number of blocks) and breadth (number of channels) of the BottleNeckCSP module. The training results are presented in Figure , which shows the loss function value curves. We observed that our models were successful in training, as the loss function values of the training sets gradually decreased with the increase of training iterations. Eventually, all five models with different parameters converged and stabilised.

Figure 9. Loss function value curve: Comparison of the Loss for five different AR-YOLOv5 models on the training set.

Figure 9. Loss function value curve: Comparison of the Loss for five different AR-YOLOv5 models on the training set.

4.4 Model testing

In this section, we evaluated five different models using our testing set. Figure displays the Precision-Recall (PR) curve for the five different parameterised AR-YOLOv5 models. As shown, the purple curve of AR-YOLOv5x, which has the largest number of parameters, consistently outperforms the other models. AR-YOLOv5l, with the second-largest number of parameters, performs better than the other models but not as well as the purple curve. On the other hand, AR-YOLOv5s and AR-YOLOv5n, with the fewest parameters, exhibit the lowest PR curves, indicating that the accuracy of the models is directly proportional to the number of parameters.

Figure 10. PR curve: Performance comparison of the five different AR-YOLOv5 models on the testing set.

Figure 10. PR curve: Performance comparison of the five different AR-YOLOv5 models on the testing set.

4.4.1. Comparison of model: with and without data preprocessing

We compared five different models with or without data preprocessing. The results with data preprocessing are presented in Table , while the results without data preprocessing are shown in Table . In models with data preprocessing, AR-YOLOv5 achieved precision (P), recall (R), F1-score (F1) and mean average precision (mAP) scores above 90% for different model parameters, with recall (R) reaching above 80% for various model parameters. The results of the models without image preprocessing, such as FFT and Butterworth high-pass filtering, are presented in Table . These results indicate that without image preprocessing, the performance of our model significantly deteriorates compared to when preprocessing was applied. This highlights the crucial role of image preprocessing in our approach. Furthermore, these results demonstrate that using our AR-YOLOv5 for shrimp detection in images captured by the underwater infrared camera is a feasible and effective approach.

Table 2. Comparison of different models with data preprocessing.

Table 3. Comparison of different models without data preprocessing.

4.5. Size-quantity-level of stomach fullness

In this section, we evaluate the performance of our farming system, utilising a dataset distinct from the images used for model training and testing. This dataset reflects natural conditions, devoid of any deliberate manipulation of the shrimp population. The pond has a significant population of shrimps with varying lengths and levels of stomach fullness.

4.5.1. Estimating the length of shrimps

Figure illustrates the experimental results of our long-term (four-month) growth measurement. The blue line represents the true length, determined by manually measuring 20 randomly selected prawns caught on the same day. The lengths of these prawns were summed and divided by 20 to obtain the true length. In contrast, the orange line represents the average_length (Equation 10) predicted by our method on the same day. The left y-axis denotes the true and predicted shrimp’s length in millimetres, while the right y-axis indicates the error percentage, presented in pink bars, calculated as the absolute value of the difference between the true length and the estimated length, and divided by the true length. Additionally, the x-axis represents the dates in 2022. We took multiple samples on different dates. The estimation results closely match the actual true values. The t-statistic is 0.272, and the p-value is 0.781. This means that we cannot conclude that there is a significant difference between the means of the two data sets, or in other words, the difference is not statistically significant. Since there are thousands of shrimps in the pond, manually measuring the length of each shrimp is impractical. As our method can sample all the captured shrimps by the cameras, it yields a smoother upward length line compared to the manual method. This indicates that our estimated length better represents the average length in the pond while reducing the labour cost of manual measurement.

Figure 11. A four-months growth Shrimps length estimation and its error percentage.

Figure 11. A four-months growth Shrimps length estimation and its error percentage.

4.5.2. Counting the number of shrimps

The video footage from cameras was sampled at 10-minute intervals, and the total number of shrimps captured by each camera at each sampling point was considered the true value. We collected a total of 10 sampling points to evaluate the accuracy of the shrimp count. The estimated values represent the model’s estimations of shrimp counts at the sampling points. The quantity distribution, absolute measurement error, and error percentage are shown in Table . The absolute measurement error was obtained by taking the absolute difference between the true value and the estimated value, and the error percentage was calculated by dividing the absolute measurement error by the true value.

Table 4. Absolute measurement error and error percentage of shrimp’s quantity.

Overall, the estimated shrimp count closely corresponded to the true values, except for a significant discrepancy observed during the fourth time interval. This was due to the partly seen body of the shrimp at the image boundary, for example, parts of the head or tail entering the view of the camera and suddenly leaving the camera’s view. As a result, the system did not count since the length could not be estimated.

4.5.3. Level of stomach fullness

We selected 9 shrimp as samples to demonstrate the varying levels of digestive tract fullness, as depicted in Figure . The top row displays shrimp detected by our method as having a stomach fullness level below 12.5%. The middle row represents shrimp with a stomach fullness level ranging from 12.5% to 50.0%. The bottom row showcases shrimp with a stomach fullness level exceeding 50%. According to our experiment's definition, the shrimp in the bottom row are categorised as “Full”, while the remaining shrimp are categorised as “Empty”.

Figure 12. Example of stomach fullness levels. Top row: Stomach fullness level < 12.5%. Middle row: 12.5% Stomach fullness level < 50.0%. Bottom row: Stomach fullness level 50.0%.

Figure 12. Example of stomach fullness levels. Top row: Stomach fullness level < 12.5%. Middle row: 12.5% ≤ Stomach fullness level < 50.0%. Bottom row: Stomach fullness level ≥50.0%.

Table presents the performance of the two methods for determining the degree of stomach fullness on the testing set. Method 1 utilised both the Outer bounding box and Inner bounding box, while Method 2 used only the Inner bounding box. We manually assessed the digestive tract fullness of 169 shrimps as either greater than or less than 50% and used these values as the ground truth. These ground truth values were then compared to the predictions from the methods, and accuracy, precision, and F1-score were calculated accordingly. It is worth mentioning that Method 1 may exhibit lower accuracy in scenes with a large amount of feed or fecal sediment, as the average brightness of the outer bounding box is affected. In contrast, Method 2 can make accurate determinations under such conditions.

Table 5. Evaluation metric results of two different methods.

5. Conclusion and future work

An intelligent system is crucial for reducing the manpower and costs involved in shrimp farming. To achieve this, we propose AR-YOLOv5 (Angular Rotation YOLOv5), which leverages multi-angle object detection, providing innovative and accurate capabilities for size estimation and degree of stomach fullness levels, multi-object tracking model providing advanced and accurate counting for the duration of captured video. The performance of the detection on the testing set demonstrates a precision of 97.70%, a recall rate of 91.42%, a mean average precision of 94.46%, and an F1-score of 95.42. For stomach fullness levels detection, our method achieves an accuracy of 91.1%, precision of 98.9%, recall of 86.24%, and F1-score of 98.5%. This approach effectively addresses challenges posed by multiple overlapped shrimps, leading to improved accuracy in shrimp feeding and reduced waste of feed and water pollution, thereby contributing to environmental sustainability. We believe that our proposed method has the potential to be widely adopted in the aquaculture industry and make a significant contribution to its growth.

There were several limitations of our current system. Firstly, when the water quality is very turbid due to the accumulation of feed and shrimp excreta, the vision may be negatively impacted. Secondly, as the grass shrimp grow, their shells become thicker and darker, making their stomach and digestive tract hardly visible. These factors can negatively impact the system’s performance. Finally, we only measure the size of the shrimps that are fully visible within the scope of the camera’s view.

In terms of future work, we aim to measure the size even when only a partial part of the shrimp is in the camera’s view. Additionally, we intend to measure the shrimp’s mobility or agility, automatically detect dead and sick shrimps, and estimate the amount of leftover food in the pond.

Acknowledgement

This research was supported in part by Higher Education Sprout Project, Ministry of Education to the Headquarters of University Advancement at National Cheng Kung University (NCKU).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was financially supported in part by Higher Education Sprout Project, Ministry of Education to the Headquarters of University Advancement at National Cheng Kung University.

References

  • Armalivia, S., Zainuddin, Z., Achmad, A., & Wicaksono, M. A. (2021). Automatic Counting Shrimp Larvae Based You Only Look Once (YOLO). Paper presented at the 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS).
  • Astiyani, W. P., Sudino, D., Prama, E. A., & Akbarurrasyid, M. (2022). Application of dissemination plastic ponds technology to Litopenaeus vannamei shrimp culture as a solution for aquaculture activities in narrow soil. Aquaculture, Aquarium, Conservation & Legislation, 15(4), 2152–2157.
  • Awotunde, O. (2021). Stomach and gut content of Long Neck Croacker–Pseudotolithus typus (Bleeker, 1863) from Lagos Lagoon, Nigera. Annals of Marine Science, 5(1), 001–006. https://doi.org/10.17352/ams.000024
  • Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
  • Chen, S., Zhang, J., & Zhan, R. (2020). R2FA-Det: Delving into high-quality rotatable boxes for ship detection in SAR images. Remote Sensing, 12(12), 2031. https://www.mdpi.com/2072-4292/12/12/2031
  • Davie, P. J., Guinot, D., & Ng, P. K. (2015). Anatomy and functional morphology of Brachyura. Treatise on Zoology-Anatomy, Taxonomy, Biology. The Crustacea, Volume 9 Part C (2 vols), 11–163.
  • Du, Y., Zhao, Z., Song, Y., Zhao, Y., Su, F., Gong, T., & Meng, H. (2023). Strongsort: Make deepsort great again. IEEE Transactions on Multimedia.
  • Fisheries, F. (2011). Aquaculture department. 2013. Global aquaculture production statistics for the year.
  • Ge, Y., Lin, S., Zhang, Y., Li, Z., Cheng, H., Dong, J., Shao, S., Zhang, J., Qi, X., & Wu, Z. (2022). Tracking and counting of tomato at different growth period using an improving YOLO-deepsort network for inspection robot. Machines, 10(6), 489. https://www.mdpi.com/2075-1702/10/6/489
  • Harbitz, A. (2007). Estimation of shrimp (Pandalus borealis) carapace length by image analysis. ICES Journal of Marine Science, 64(5), 939–944. https://doi.org/10.1093/icesjms/fsm047
  • Hashisho, Y., Dolereit, T., Segelken-Voigt, A., Bochert, R., & Vahl, M. (2021). AI-assisted Automated Pipeline for Length Estimation, Visual Assessment of the Digestive Tract and Counting of Shrimp in Aquaculture Production. Paper presented at the VISIGRAPP (4: VISAPP).
  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
  • Hong Khai, T., Abdullah, S. N. H. S., Hasan, M. K., & Tarmizi, A. (2022). Underwater fish detection and counting using mask regional convolutional neural network. Water, 14(2), 222. https://www.mdpi.com/2073-4441/14/2/222
  • Kandimalla, V., et al. (2022). Automated detection, classification and counting of fish in fish passages with deep learning. Frontiers in Marine Science, 2049.
  • Koushik, C. V., Kamal, R. V., Tarun, C., Teja, K. D., & Manne, S. (2021). An Efficient Algorithm for Prawn Detection and Length Identification. Paper presented at the Proceedings of International Conference on Computational Intelligence and Data Engineering: ICCIDE 2020.
  • Lai, P.-C., Lin, H.-Y., Lin, J.-Y., Hsu, H.-C., Chu, Y.-N., Liou, C.-H., & Kuo, Y.-F. (2022). Automatic measuring shrimp body length using CNN and an underwater imaging system. Biosystems Engineering, 221, 224–235. doi:https://doi.org/10.1016/j.biosystemseng.2022.07.006
  • Lainez, S. M. D., & Gonzales, D. B. (2019). Automated fingerlings counting using convolutional neural network. In 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). IEEE.
  • Lei, X., Fan, Y., Li, K.-C., Castiglione, A., & Hu, Q. (2021). High-precision linearized interpretation for fully connected neural network. Applied Soft Computing, 109, 107572. https://doi.org/10.1016/j.asoc.2021.107572
  • Li, Y., Lu, H., Li, K.-C., Kim, H., & Serikawa, S. (2018). Non-uniform de-scattering and de-blurring of underwater images. Mobile Networks and Applications, 23(2), 352–362. https://doi.org/10.1007/s11036-017-0933-7
  • Liang, W., Long, J., Li, K.-C., Xu, J., Ma, N., & Lei, X. (2021). A fast defogging image recognition algorithm based on bilateral hybrid filtering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 17(2), 1–16. https://doi.org/10.1145/3391297
  • Loya-Javellana, G. N., Fielder, D. R., & Thorne, M. J. (1995). Foregut evacuation, return of appetite and gastric fluid secretion in the tropical freshwater crayfish, Cherax quadricarinatus. Aquaculture, 134(3–4), 295–306. https://doi.org/10.1016/0044-8486(95)00050-C
  • Lu, H., Li, Y., Serikawa, S., Li, X., Li, J., & Li, K.-C. (2016). 3D underwater scene reconstruction through descattering and colour correction. International Journal of Computational Science and Engineering, 12(4), 352–359. https://doi.org/10.1504/IJCSE.2016.076950
  • Makandar, A., & Halalli, B. (2015). Image enhancement techniques using highpass and lowpass filters. International Journal of Computer Applications, 109(14), 12–15. https://doi.org/10.5120/19256-0999
  • Nguyen, K.-T., Nguyen, C.-N., Wang, C.-Y., & Wang, J.-C. (2020). Two-phase instance segmentation for whiteleg shrimp larvae counting. Paper presented at the 2020 IEEE International Conference on Consumer Electronics (ICCE).
  • Nunes, A., Gesteira, T., & Goddard, S. (1997). Food ingestion and assimilation by the Southern brown shrimp Penaeus subtilis under semi-intensive culture in NE Brazil. Aquaculture, 149(1–2), 121–136. https://doi.org/10.1016/S0044-8486(96)01433-0
  • Nunes, A. J., & Parsons, G. J. (2000). Size-related feeding and gastric evacuation measurements for the Southern brown shrimp Penaeus subtilis. Aquaculture, 187(1–2), 133–151. https://doi.org/10.1016/S0044-8486(99)00386-5
  • Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, man, and Cybernetics, 9(1), 62–66. https://doi.org/10.1109/TSMC.1979.4310076
  • Parra-Flores, A.., Ponce-Palafox, J., Spanopoulos-Hernández, M., & Martinez-Cardenas, L. (2019). Feeding behavior and ingestion rate of juvenile shrimp of the genus Penaeus (Crustacea: Decapoda). Open Access Journal of Science, 3(3), 111–113.
  • Prasad, A., Sumanth, N. V. R., Sivaprasad, P., Sairam, K., & Ajaybabu, K. (2023). A Survey on Automatic Feeder System for Aqua Farming by using Arduino. Paper presented at the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT).
  • Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
  • Robertson, L., Wrence, A. L., & Castille, F. (1993). Effect of feeding frequency and feeding time on growth of Penaeus vannamei (Boone). Aquaculture Research, 24(1), 1–6. https://doi.org/10.1111/j.1365-2109.1993.tb00823.x
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Paper presented at the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18.
  • Sun, Y., Xia, C., Gao, X., Yan, H., Ge, B., & Li, K.-C. (2022). Aggregating dense and attentional multi-scale feature network for salient object detection. Digital Signal Processing, 130, 103747. https://doi.org/10.1016/j.dsp.2022.103747
  • Wang, G., Van Stappen, G., & De Baets, B. (2020). Automated Artemia length measurement using U-shaped fully convolutional networks and second-order anisotropic Gaussian kernels. Computers and Electronics in Agriculture, 168, 105102. https://doi.org/10.1016/j.compag.2019.105102
  • Wang, X., Zheng, S., Zhang, C., Li, R., & Gui, L. (2021). R-YOLO: A real-time text detector for natural scenes with arbitrary rotation. Sensors, 21(3), 888. https://www.mdpi.com/1424-8220/21/3/888
  • Ward, T. M., Mascagni, P., Ban, Y., Rosman, G., Padoy, N., Meireles, O., & Hashimoto, D. A. (2021). Computer vision in surgery. Surgery, 169(5), 1253–1256. https://doi.org/10.1016/j.surg.2020.10.039
  • Weng, T.-H., Chen, Y.-S., Lu, H., Marino, M. D., & Li, K.-C. (2019). On parallelisation of image dehazing with OpenMP. International Journal of Embedded Systems, 11(4), 427–439. https://doi.org/10.1504/IJES.2019.10022123
  • Weng, T.-H., Chiu, C.-C., Hsieh, M.-Y., Lu, H., & Li, K.-C. (2020). Parallelisation of practical shared sampling alpha matting with OpenMP. International Journal of Computational Science and Engineering, 21(1), 105–115. https://doi.org/10.1504/IJCSE.2020.105217
  • Weng, T.-H., Li, K.-C., Yang, Z., & Liu, C. (2020). On the code modernization of shared sampling alpha matting with OpenMP. Future Generation Computer Systems, 107, 177–191. https://doi.org/10.1016/j.future.2019.12.012
  • Weng, T.-H., Wang, T.-X., Hsieh, M.-Y., Jiang, H., Shen, J., & Li, K.-C. (2019b). Parallel fast Fourier transform in SPMD style of Cilk. International Journal of Embedded Systems, 11(6), 778–787. https://doi.org/10.1504/IJES.2019.103998
  • Yang, R., Pan, Z., Jia, X., Zhang, L., & Deng, Y. (2021). A novel CNN-based detector for ship detection based on rotatable bounding box in SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 1938–1958. https://doi.org/10.1109/JSTARS.2021.3049851
  • Yang, R., Wang, G., Pan, Z., Lu, H., Zhang, H., & Jia, X. (2020). A novel false alarm suppression method for CNN-based SAR ship detector. IEEE Geoscience and Remote Sensing Letters, 18(8), 1401–1405. https://doi.org/10.1109/LGRS.2020.2999506
  • Yang, X., Hou, L., Zhou, Y., Wang, W., & Yan, J. (2021). Dense label encoding for boundary discontinuity free rotation detection. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  • Yang, X., & Yan, J. (2020). Arbitrary-oriented object detection with circular smooth label. Paper presented at the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16.
  • Yu, G., Chang, Q., Lv, W., Xu, C., Cui, C., Ji, W., Dang, Q., Deng, K., Wang, G., & Du, Y. (2021). PP-PicoDet: A better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902.
  • Zhang, G., Yan, Y., Tian, Y., Liu, Y., Li, Y., Zhou, Q., Zhou, R., & Li, K.-C. (2019). Water contamination monitoring system based on big data: A case study. International Journal of Computational Science and Engineering, 19(4), 494–505. https://doi.org/10.1504/IJCSE.2019.101894
  • Zhang, L., Zhou, X., Li, B., Zhang, H., & Duan, Q. (2022). Automatic shrimp counting method using local images and lightweight YOLOv4. Biosystems Engineering, 220, 39–54. doi:https://doi.org/10.1016/j.biosystemseng.2022.05.011
  • Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020). Distance-IoU loss: Faster and better learning for bounding box regression. Paper presented at the Proceedings of the AAAI conference on artificial intelligence.