Full article: Obstacle avoidance and trajectory optimisation for an autonomous vessel utilising MILP path planning, computer vision based perception and feedback control

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this study, we investigated autonomous vessel obstacle avoidance using advanced techniques within the Guidance, Navigation, and Control (GNC) framework. We propose a Mixed Integer Linear Programming (MILP) based Guidance system for robust path planning avoiding static and dynamic obstacles. For Navigation, we suggest a multi-modal neural network for perception, demonstrating the identification of obstacle type, position, and orientation using imaging sensors. Additionally, the paper compares an error-based PID control strategy and a Model Predictive Control (MPC) scheme as well. This evaluation aids in better evaluating their performance and determining their applicability within the GNC scheme. We detail the implementation of these systems, present simulation results, and offer a performance evaluation using an experimental dataset. Our findings, analysed through qualitative discussion and quantitative performance indicators, contribute to advancements in autonomous navigation and the control strategies to achieve it.

Keywords:

1. Introduction

Maritime goods transportation has significantly expanded over the past two decades, with its market share doubling to encompass 90% of global commodity transport (Sánchez-Beaskoetxea et al. Citation2021). Supported by advancements in network infrastructure and vessel efficiency, this modality's growth is expected to continue, rising from $537.07 billion in 2022 to $650.21 billion in 2026–an estimated annual growth rate of approximately 5% (Sánchez-Beaskoetxea et al. Citation2021). However, accompanying this growth is an increased demand for technological improvements. According to statistics from 1999 to 2019, human error contributed to 50% of pollution incidents, 65% of personal injuries, 80% of property damages, and 90% of collisions (Corić et al. Citation2021). Overconfidence, recklessness, fatigue, and inadequate communication are often cited as leading factors. Autonomous navigation offers potential solutions to these issues, potentially enhancing overall safety and efficiency in maritime goods transportation.

To enable autonomous navigation, an autonomous vessel typically integrates three distinct modules: the Guidance, Navigation, and Control (GNC) system, as illustrated in Figure . This paper explores the potential of employing advanced techniques within the GNC subsystems to augment the current standard practices and state-of-the-art in autonomous navigation, specifically in the areas of obstacle vision recognition and avoidance.

Figure 1. Guidance navigation and control system framework (Hepworth et al. Citation2021).

The Guidance System in an autonomous vessel is tasked with global and local path planning. The application of a Mixed Integer Linear Programming (MILP) algorithm, for instance, enables the path-planning optimiser to update the global path to circumvent local obstacles. This technique, which requires the formulation of a cost function and a set of constraints to minimise path deviation amidst static and dynamic obstacles, has proven effective for global trajectory planning across different fields (Schouwenaars et al. Citation2001; Coleman Citation2024). This paper further explores its use in the maritime domain. Alternative approaches, based on deep learning algorithms were taken into consideration as well (Parkes et al. Citation2018; Sivaraj et al. Citation2023). However in the context of autonomous ship navigation, particularly for obstacle avoidance, MILP offers distinct advantages over deep learning approaches. MILP provides deterministic outcomes crucial for safety and regulatory compliance in maritime operations. It excels in efficiently handling multiple constraints and objectives, ensuring optimal pathfinding adaptable at various navigational scenarios. Unlike deep learning, MILP requires less computational power for real-time decision-making and is not heavily reliant on large datasets for training, making it more adaptable to unpredictable maritime environments. Additionally, MILP's robustness against uncertainty, along with its capacity to incorporate safety regulations and its potential of scalability, makes it a more suitable and reliable choice for the specific application in this studies.

Within a Navigation System, perception plays a vital role in autonomous sailing. Obstacles must be detected, and their position and orientation determined (Kim et al. Citation2018). Though point cloud clustering is a common perception approach used by autonomous vehicles (Khalid et al. Citation2013; Kato et al. Citation2015; Wang et al. Citation2019; Choy et al. Citation2015), this research proposes a novel multi-modal neural network architecture to provide high-level situational awareness for collision avoidance purposes. This approach utilises a CNN for object detection from colour images and processes depth data to provide 3D data. The data is then fused to offer a cropped birds-eye view input to a secondary neural network, which is trained specifically to extract the orientation of an obstacle–in this case, a vessel.

Control strategy design forms the third pillar in the GNC framework. In the context of autonomous navigation, its primary role is to measure the internal state of a vessel system (i.e. position, velocity, orientation) and minimise the error between the measurement and the reference provided by the guidance module. Despite the prevalence of Proportional, Integral, and Derivative (PID) controllers in industrial automation (accounting for 95% of closed-loop control algorithms) due to their simple implementation and robustness against external disturbances (Dey and Sen Citation2020), this paper investigates the Model Predictive Control an optimisation-based control technique–as an alternative, used to regulate a process while fulfilling a set of constraints (Haseltalab et al. Citation2020).

While considering the potential of deep learning-based control methods (Parkes et al. Citation2022; Ye et al. Citation2023; Deraj et al. Citation2023), this study has opted for these classical control strategies, recognising their established efficacy in managing the dynamic and uncertain conditions characteristic of an autonomous vessel navigation domain. Their robust frameworks enable handling of variable conditions, a challenge where deep learning approaches often require extensive, context-specific training data to achieve similar levels of reliability. Additionally, the reliance of deep learning on large and diverse datasets for training is a significant limitation, especially in fields where such datasets are scarce or challenging to compile. The computational intensity and time requirements for training deep learning models further enhance this issue, rendering them less practical for the specific applications investigated in this paper.

The rest of this paper is organised as follows: Section 2 introduces the proposed GNC scheme, with each sub-system discussed in its own sub-section. Section 3 presents the results and performance evaluations of the individual sub-systems, gleaned from both simulated and experimental testing. Finally, Section 4 summarises the paper's findings and offers conclusions and recommendations for future work. This research paper is an expanded and enhanced version of our previous work, which was presented at the International Ship Control Systems Symposium (iSCSS) in Delft, The Netherlands, in November 2022 (Garofano et al. Citation2022).

2. Methodology

2.1. Guidance strategy

The general formulation of a Mixed Integer Linear Programming problem can be formulated as (Vielma Citation2015; Richards et al. Citation2002): (1) $\begin{aligned} min & Z = c^{T} x \\ s . t ., \\ A x \leq b \\ x = (x_{1}, \dots, x_{n}, i_{1}, \dots, i_{k}, w_{1}, \dots, w_{j})^{T} \\ x_{n}, w_{j} \in R, i_{k} \in Z \end{aligned}$ (1) Here, $x$ denotes the vector containing the variables to be optimised and $c^{T}$ corresponds to a coefficient vector. The optimisation vector is comprised of continuous real variables $x_{n}$ and integer variables $i_{k}$ and a number of slack variables are introduced $w_{j}$ . The slack variables are introduced to transform hard constraints into soft constraints that are penalised for in the cost function. This proves useful in the context of path planning as will become evident. The linear matrix inequality $A x \leq b$ corresponds to the constraints on the optimisation variables.

In the context of path planning, the continuous variables that are to be optimised are defined assuming an a-priori planned path for the vessel denoted by the vector $η_{ref} (t)$ and the optimisation solution $η_{sol} (t)$ . One may define the state tracking error of the solution by relationship in Equation (Equation2(2) $\begin{aligned} x_{n} (t) = η_{sol} (t) - η_{ref} (t) \end{aligned}$ (2) ). For maneuvering purposes, a vessel's pose is uniquely defined by its 2-dimensional position in NED-coordinates and its heading angle as in Equation (Equation3(3) $\begin{aligned} η (t) = (Y X ψ)^{T} \end{aligned}$ (3) ). For the purpose of path deviation minimisation, the heading angle is neglected and the error state is limited to the reduced 2-dimensional error vector defined by Equation (Equation4(4) $\begin{aligned} x_{n} (t) = (\begin{matrix} | Y_{sol} (t) - Y_{ref} (t) | \\ | X_{sol} (t) - X_{ref} (t) | \end{matrix}) \end{aligned}$ (4) ). (2) $\begin{aligned} x_{n} (t) = η_{sol} (t) - η_{ref} (t) \end{aligned}$ (2) (3) $\begin{aligned} η (t) = (Y X ψ)^{T} \end{aligned}$ (3) (4) $\begin{aligned} x_{n} (t) = (\begin{matrix} | Y_{sol} (t) - Y_{ref} (t) | \\ | X_{sol} (t) - X_{ref} (t) | \end{matrix}) \end{aligned}$ (4) This $l_{2}$ norm may be transformed to a $l_{1}$ norm suitable for linear optimisation by introducing the augmented error system in Equation (Equation5(5) $\begin{aligned} [\begin{matrix} \tilde{Y} (t) \\ - \tilde{Y} (t) \\ \tilde{X} (t) \\ - \tilde{X} (t) \end{matrix}] = [\begin{matrix} (Y_{sol} (t) - Y_{ref} (t)) \\ - (Y_{sol} (t) - Y_{ref} (t)) \\ (X_{sol} (t) - X_{ref} (t)) \\ - (X_{sol} (t) - X_{ref} (t)) \end{matrix}] \end{aligned}$ (5) ). This error system can be transformed into a system of Linear Matrix Inequalities (LMI) that are to be optimised. For the purpose of path deviation minimisation, these are transformed into the LMIs in Equation (Equation6(6) $\begin{aligned} [\begin{matrix} 1 & 0 & - 1 & 0 \\ - 1 & 0 & - 1 & 0 \\ 0 & 1 & 0 & - 1 \\ 0 & - 1 & 0 & - 1 \end{matrix}] [\begin{matrix} \tilde{Y} (t) \\ \tilde{X} (t) \\ w_{\tilde{Y}} (t) \\ w_{\tilde{X}} (t) \end{matrix}] \leq [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}] \end{aligned}$ (6) ) as the first set of constraints used in the optimisation problem. Where $w_{\tilde{Y}} (t)$ and $w_{\tilde{X}} (t)$ represent the slack variables that are introduced at every time step. (5) $\begin{aligned} [\begin{matrix} \tilde{Y} (t) \\ - \tilde{Y} (t) \\ \tilde{X} (t) \\ - \tilde{X} (t) \end{matrix}] = [\begin{matrix} (Y_{sol} (t) - Y_{ref} (t)) \\ - (Y_{sol} (t) - Y_{ref} (t)) \\ (X_{sol} (t) - X_{ref} (t)) \\ - (X_{sol} (t) - X_{ref} (t)) \end{matrix}] \end{aligned}$ (5) (6) $\begin{aligned} [\begin{matrix} 1 & 0 & - 1 & 0 \\ - 1 & 0 & - 1 & 0 \\ 0 & 1 & 0 & - 1 \\ 0 & - 1 & 0 & - 1 \end{matrix}] [\begin{matrix} \tilde{Y} (t) \\ \tilde{X} (t) \\ w_{\tilde{Y}} (t) \\ w_{\tilde{X}} (t) \end{matrix}] \leq [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix}] \end{aligned}$ (6) Hence, the hard constraints enforcing the path deviation are transformed into soft constraints by introducing the slack variables. These slack variables are thus included in the optimisation vector (Equation4(4) $\begin{aligned} x_{n} (t) = (\begin{matrix} | Y_{sol} (t) - Y_{ref} (t) | \\ | X_{sol} (t) - X_{ref} (t) | \end{matrix}) \end{aligned}$ (4) ). By setting the values in the coefficient vector as stated in (Equation1(1) $\begin{aligned} min & Z = c^{T} x \\ s . t ., \\ A x \leq b \\ x = (x_{1}, \dots, x_{n}, i_{1}, \dots, i_{k}, w_{1}, \dots, w_{j})^{T} \\ x_{n}, w_{j} \in R, i_{k} \in Z \end{aligned}$ (1) ) to appropriate values, one can minimise for the value of the slack variables and thus for path deviation. For this purpose, these coefficient values are set to positive, real values. Moreover, the vector elements in $c^{T}$ corresponding to the errors themselves $\tilde{Y} (t)$ and $\tilde{X} (t)$ are set to zero.

Having established the framework for path deviation minimisation, additional constraints can be introduced. For the purpose of enforcing a maximum speed constraint, LMI in Equation (Equation7(7) $\begin{aligned} [\begin{matrix} 1 & 0 \\ - 1 & 0 \\ 0 & 1 \\ 0 & - 1 \end{matrix}] [\begin{matrix} Y_{sol} (t + Δt) - Y_{sol} (t) \\ X_{sol} (t + Δt) - X_{sol} (t) \end{matrix}] \\ \leq [\begin{matrix} (V_{Y +}^{max} * Δt) - (Y_{ref} (t + Δt) - Y_{ref} (t)) \\ - (V_{Y -}^{max} * Δt) + (Y_{ref} (t + Δt) - Y_{ref} (t)) \\ (V_{X +}^{max} * Δt) - (X_{ref} (t + Δt) - X_{ref} (t)) \\ - (V_{X -}^{max} * Δt) + (X_{ref} (t + Δt) - X_{ref} (t)) \end{matrix}] \end{aligned}$ (7) ) is formulated for the linearised maximum speed constraints. The maximum speeds in both positive and negative surge and sway directions are assumed to be known in this application. Furthermore $Δt$ is a designed parameter that defines the time interval between solutions of the optimisation algorithm. The value of this time interval subsequently determines the update frequency of the algorithm and must be chosen accordingly. (7) $\begin{aligned} [\begin{matrix} 1 & 0 \\ - 1 & 0 \\ 0 & 1 \\ 0 & - 1 \end{matrix}] [\begin{matrix} Y_{sol} (t + Δt) - Y_{sol} (t) \\ X_{sol} (t + Δt) - X_{sol} (t) \end{matrix}] \\ \leq [\begin{matrix} (V_{Y +}^{max} * Δt) - (Y_{ref} (t + Δt) - Y_{ref} (t)) \\ - (V_{Y -}^{max} * Δt) + (Y_{ref} (t + Δt) - Y_{ref} (t)) \\ (V_{X +}^{max} * Δt) - (X_{ref} (t + Δt) - X_{ref} (t)) \\ - (V_{X -}^{max} * Δt) + (X_{ref} (t + Δt) - X_{ref} (t)) \end{matrix}] \end{aligned}$ (7) Central to the goal of this guidance system is the incorporation of an obstacle avoidance strategy which is introduced in the form of a MILP problem. Once the positions of static and dynamic obstacles are known, the avoidance constraints can be formulated. The space to be avoided is formulated here such that a rectangular box is constructed around obstacles, with $i_{k} \in [0, 1]$ and $X_{obs}^{min} (t)$ and $Y_{obs}^{min} (t)$ denoting the minimum coordinates of the obstacle. For safety purposes, obstacles are to be avoided by some safety distance, this is taken into account in the strategy proposed by adding a constant M, in metres, that correspond to a safe margin, i.e. the area around the obstacle that is identified as a no-navigation zone. The obstacle boundaries can then be extended to act as a safety factor as required. The constraint in Equation (Equation8(8) $\begin{aligned} i_{1} + i_{2} + i_{3} + i_{4} \leq 3 \end{aligned}$ (8) ) is added to guarantee that at least one of the constraints is satisfied and a feasible path in any of the given directions can be generated at all times. This results in the LMI defined in Equation (Equation9(9) $\begin{aligned} [\begin{matrix} 1 & 0 & - M & 0 & 0 & 0 \\ - 1 & 0 & 0 & - M & 0 & 0 \\ 0 & 1 & 0 & 0 & - M & 0 \\ 0 & - 1 & 0 & 0 & 0 & - M \\ 0 & 0 & 1 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} Y_{sol} (t) \\ X_{sol} (t) \\ i_{1} \\ i_{2} \\ i_{3} \\ i_{4} \end{matrix}] \\ \leq [\begin{matrix} {\tilde{Y}}_{obs}^{min} (t) - Y_{ref} (t) \\ - {\tilde{Y}}_{obs}^{max} (t) + Y_{ref} (t) \\ {\tilde{X}}_{obs}^{min} (t) - X_{ref} (t) \\ - {\tilde{X}}_{obs}^{max} (t) + X_{ref} (t) \\ 3 \end{matrix}] \end{aligned}$ (9) ). (8) $\begin{aligned} i_{1} + i_{2} + i_{3} + i_{4} \leq 3 \end{aligned}$ (8) (9) $\begin{aligned} [\begin{matrix} 1 & 0 & - M & 0 & 0 & 0 \\ - 1 & 0 & 0 & - M & 0 & 0 \\ 0 & 1 & 0 & 0 & - M & 0 \\ 0 & - 1 & 0 & 0 & 0 & - M \\ 0 & 0 & 1 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} Y_{sol} (t) \\ X_{sol} (t) \\ i_{1} \\ i_{2} \\ i_{3} \\ i_{4} \end{matrix}] \\ \leq [\begin{matrix} {\tilde{Y}}_{obs}^{min} (t) - Y_{ref} (t) \\ - {\tilde{Y}}_{obs}^{max} (t) + Y_{ref} (t) \\ {\tilde{X}}_{obs}^{min} (t) - X_{ref} (t) \\ - {\tilde{X}}_{obs}^{max} (t) + X_{ref} (t) \\ 3 \end{matrix}] \end{aligned}$ (9) The linear programming framework also requires an equilibrium solution in the form of a Linear Matrix Equality (LME). In the use case presented here, the equilibrium solution that can be used is the current state of the vessel, i.e. $x (t = 0) = x_{0} = (Y_{0} X_{0})^{T}$ This results in the Linear Matrix Equality in Equation (Equation10(10) $[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} {\tilde{Y}}_{0} \\ {\tilde{X}}_{0} \end{matrix}] = [\begin{matrix} η_{Y} - Y_{ref} \\ η_{X} - X_{ref} \end{matrix}]$ (10) ) with $η_{Y}$ and $η_{X}$ corresponding to the current vessel position. (10) $[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} {\tilde{Y}}_{0} \\ {\tilde{X}}_{0} \end{matrix}] = [\begin{matrix} η_{Y} - Y_{ref} \\ η_{X} - X_{ref} \end{matrix}]$ (10) These separate matrices can now be concatenated into one single LMI and formulated as a single MILP problem as in Equation (Equation11(11) $\begin{aligned} A_{total} (t) x (t) \leq b_{total} (t) \end{aligned}$ (11) ). The main objective of the MILP approach proposed is to minimise the path deviation and avoiding obstacles within a finite time horizon. The final product yields the system of LMIs in Equation (Equation12(12) $\begin{aligned} [\begin{matrix} A_{total} (t) \\ A_{total} (t + Δt) \\ ⋮ \\ A_{total} (t + k Δt) \end{matrix}] [\begin{matrix} x (t) \\ x (t + Δt) \\ ⋮ \\ x (t + k Δt) \end{matrix}] \leq [\begin{matrix} b_{total} (t) \\ b_{total} (t + Δt) \\ ⋮ \\ b_{total} (t + k Δt) \end{matrix}] \end{aligned}$ (12) ), with $k Δt$ representing the time horizon. Using an MILP solver, a solution for the values of $[x (t), \dots, x (t + k Δt)]$ can be calculated. These are then used to generate a new set of way points that serve as input for the control system. (11) $\begin{aligned} A_{total} (t) x (t) \leq b_{total} (t) \end{aligned}$ (11) (12) $\begin{aligned} [\begin{matrix} A_{total} (t) \\ A_{total} (t + Δt) \\ ⋮ \\ A_{total} (t + k Δt) \end{matrix}] [\begin{matrix} x (t) \\ x (t + Δt) \\ ⋮ \\ x (t + k Δt) \end{matrix}] \leq [\begin{matrix} b_{total} (t) \\ b_{total} (t + Δt) \\ ⋮ \\ b_{total} (t + k Δt) \end{matrix}] \end{aligned}$ (12)

2.2. Navigation system

Situational awareness is achieved through the implementation of a vision system specifically configured to generate data pertinent to the path planning technique, being capable of providing a constant update of environmental conditions in the presence of both static and dynamic obstacles. The perception approach presented hereon is developed to complement the capacity of the MILP based guidance system to make use of a constantly updating environment, whilst further providing a reliable estimation of orientation for both static and dynamic obstacles alike.

The approach to perception proposed in this paper comes in the form of a multi-modal neural network whose architecture is presented in Figure . A single stereovision device is deployed in this system, namely the Intel© RealSense $^{™}$ D435i which provides both colour images through the RGB feed and stereo-depth frames via a depth feed. These two feeds act as the input to the perception system where they are subsequently processed individually with the results then being fused. Both feeds are configured to a resolution of 848x480 and 30 frames per second. It is worth noting that other sensors capable of providing depth data could also be used within this network structure.

The colour frames supplied by the RGB feed are utilised for obstacle detection, achieved through the application of a Conventional Neural Network trained to detect obstacles within the experimental environment, referred to in this work as the ‘VesselNet’. The VessselNet is built upon the Faster R-CNN framework, with an Inception 101 feature extractor, with proposals limited to 50 to allow for an adequate trade-off between speed and accuracy.For further details on this algorithm and the training choices, the reader is directed to the work of (Huang et al. Citation2016). Whilst this specific model, therefore the details of the algorithm shall not be any further divulged in this work. The key outputs of this network are the bounding box and classification of each obstacle detected in the frame. The deep learning methodology, particularly concerning the training, testing, and validation splits, as well as the variety and generalisation of the training set, takes inspiration from the approach used in (Hepworth et al. Citation2021). In this model, similar to the one presented in the other study, a dataset of 1500 annotated colour frames was used for training. This dataset includes a diverse range of model-scale tugboats and offshore service vessels, as depicted in Figures and . The dataset was divided into training, validation, and test segments to ensure a robust and general model, in line with best practices established in the field.

Figure 2. ‘Tito Neri’ model scale tugboat.

Figure 3. ‘Grey Seabax’ model scale offshore vessel.

Figure 4. Multimodal neural network architecture for perception tasks.

In parallel to the obstacle detection procedure, the depth feed is processed through the application of filters to improve the usability of the data. A breakdown of the filter process can be seen in Figure , where the steps are encompassed by the dashed blue box. As the depth frames generated by the stereovision setup are not inherently aligned with the images from colour sensor, this must be achieved pragmatically. The frames are aligned by translating the depth pixels to a synthetic viewport which matches the field of view of the colour sensor. This allows for depth to be directly extracted from a ROI identified in a colour frame. This aligned depth frame then requires some further processing, starting with a decimation filter. Decimating the depth is a sub-sampling technique to smooth the data, improving later point-cloud visualisation. Spatial and temporal filters are applied thereafter to improve edge sharpness and remove noise. Finally any dead-pixels that remain in the depth frame after decimation are accounted for by a hole-filling operation. This operation replaces each instance of a dead pixel by the largest depth value from its surrounding neighbours, assigning it an estimated value.

In order to generate a 2D birds-eye-view (BEV) plot as an input for the second neural network, the obstacle detection data and the depth data need to first be fused and processed. Figure shows a breakdown of the fusion procedure in the dashed red box. The obstacle bounds supplied by the VesselNet are utilised to extract the region of interest (ROI) from the processed depth data creating a cropped depth matrix. A point-cloud is then generated by projecting each pixel to a point within 3D space with the known device intrinsics. Median clustering is applied to the raw point-cloud to remove any outliers. This is specifically achieved through the exclusion of points outside a depth radius of half the vessel beam from the median point. This vessel beam is estimated from a look up table based upon the vessel classification from the VesselNet. Finally, the fusion stage generates a BEV plot of the processed point-cloud data (Beltran et al. Citation2018). This BEV plot is created in the form of a 3-channel image, encoding the vertical measurement that was lost when reducing to two dimensions as a colour-map.

The second Convolutional Neural Network, ‘BearingNet’ is tasked with extracting an estimate of yaw angle from the generated BEV plot. This network is built upon the RetinaNet framework (Lin et al. Citation2017, Citation2020) with the ResNet50 feature extractor, which on top of conducting traditional 2D object detection, is also endowed with the capacity to infer object orientation. This capacity is achieved by generating anchor boxes at various angles whilst still applying the usual factors of scale and aspect ratio. This addition not only adds computational load to the object detector, but also alters the method for calculating intersection over union (IoU). The problem of IoU calculation is solved by applying a sequential cutting method which recursively assesses the intersection of edges between two boxes. The former issue of computational load is rectified through the parallel processing using the CUDA integration available within the NVIDIA object detection toolkit with which this network was trained. As with the first object detection neural network, this 'BearingNet' provides bounding box coordinates and a classification as well as the new angle attribute. The bounding box angle (θ) produced is measured anti-clockwise from the x-axis of the BEV plot and constrained between $- π$ and π. At this stage of research, the BearingNet is trained on a dataset of 750 BEV plot images specific to the offshore model-scale vessel only.

The final procedure in the perception approach is to post-process the output of the BearingNet and the point-cloud from the fusion stage. This is processing involves a simple translation from the BEV plot pixel frame to the real-world coordinate frame and transforming the angle to align with that of the autonomous vessel. The final outputs of the multi-modal network are the coordinates of the obstacle bounds, the bearing of the obstacle and the vessel classification. All three attributes of particular use to a guidance system conducting collision avoidance.

2.3. Control design

In this section, we outline the Control Design methodology for our Guidance, Navigation, and Control (GNC) scheme. We leverage a Model-Based design approach, exploiting analytical solutions for the control problem while examining the dynamics of a ship system. Our focus will be on two distinct control methods.

The first method stems from classical control theory, wherein we'll scrutinise the fundamental characteristics of a vessel's mathematical model and its interaction with the derived control law. The second method is rooted in the field of optimal control, specifically Model Predictive Control (MPC). Here, the presence of a cost function and constraints allows the designer to better balance between performance and control effort.

The dynamics of a vessel navigating on water, subjected to environmental disturbances like wind, waves, and currents, involves six degrees of freedom. The equations of motion can be obtained using either the Newton-Euler or the Lagrange equations.

A vessel hull of constant mass m and centre of gravity ( $x_{g}, y_{g}, z_{g}$ ) can be described by the following set of coupled differential equations (Fossen Citation2011): (13) ${\begin{cases} m [\dot{u} - vr + wq - x_{g} (q^{2} + r^{2}) + y_{g} (pq - \dot{r}) + z_{g} (pr + \dot{q})] = X \\ m [\dot{v} - wp + ur - y_{g} (r^{2} + p^{2}) + z_{g} (qr - \dot{p}) + x_{g} (qp + \dot{r})] = Y \\ m [\dot{w} - uq + vp - z_{g} (p^{2} + q^{2}) + x_{g} (rp - \dot{q}) + y_{g} (rq + \dot{p})] = Z \\ I_{x} \dot{p} + (I_{z} - I_{y}) qr - (\dot{r} + pq) I_{xz} + (r^{2} - q^{2}) I_{yz} + (pr - \dot{q}) \\ I_{xy} + m [y_{g} (\dot{w} - uq + vp) - z_{g} (\dot{v} - wp + ur)] = K \\ I_{y} \dot{q} + (I_{x} - I_{z}) rp - (\dot{p} + qr) I_{xy} + (p^{2} - r^{2}) I_{zx} + (qp - \dot{r}) \\ I_{yz} + m [z_{g} (\dot{u} - vr + wq) - x_{g} (\dot{w} - uq + vp)] = M \\ I_{z} \dot{r} + (I_{y} - I_{x}) pq - (\dot{q} + rp) I_{yz} + (q^{2} - p^{2}) I_{xy} + (rq - \dot{p}) \\ I_{zx} + m [x_{g} (\dot{v} - wp + ur) - y_{g} (\dot{u} - vr + wq)] = N \end{cases}$ (13) In these equations, X, Y, Z, K, M, N denote the external forces and moments, $u, v, r, p, q, w$ represent the linear and angular body velocities, and $m, I_{* *}$ symbolise the mass and inertia tensor of the vessel. This model is one of the most comprehensive to describe the time-domain dynamic evolution of a marine craft.

However, a vectorial representation of the equations of motion could be adopted to better exploit the physical properties of the model. This representation offers distinct advantages, such as revealing specific mathematical system properties and allowing for a reduction in the number of coefficients required for control by rearranging the equation to yield semi-definite positive and skew-symmetric matrices (Fossen Citation2011; Sciavicco et al. Citation1998).

The compact vessel model can then be expressed as: (14) $M \dot{ν} + C (ν) ν + D ν = τ .$ (14) In this equation, ν represents the generalised velocities vector $ν = [u, v, r, w, p, q, r]^{'}$ , $τ = [X, Y, Z, K, M, N]$ stands for the generalised force vector acting on the craft, and M, C, and D are respectively the rigid-body inertia matrix, the Coriolis and centripetal forces matrix, and the Damping matrix.

An alternative to the 6DoF vector formalism, shown in Equation (Equation14(14) $M \dot{ν} + C (ν) ν + D ν = τ .$ (14) ), is to apply manoeuvring theory for a three degrees of freedom (3DoF) representation of a marine vessel's motion. This 3DoF model focuses on the horizontal motion of a ship in surge, sway, and yaw, so the state vector becomes $ν = [u, v, r]^{'}$ . Therefore, the motion associated with heave, roll, and pitch is ignored, i.e. w = p = q = 0. This simplified model garners our attention in this paper, since our primary control design goal is enabling a vessel to autonomously navigate a predetermined path on an xy Cartesian plane.

The section that follows delves into a more comprehensive analysis of ship dynamic manoeuvring based on the 3DoF model. We aim to demonstrate how a system of equation of different mathematical models, each with its unique mathematical properties, can be used to formulate an appropriate control law. We can categorise the different sailing scenarios of a surface vessel as follows:

Unmooring: This is the initial stage where a ship begins its departure. Here, surge, sway, and yaw velocities are minimal and approximately equal: $u = v = r \approx 0$ .
Low-velocity manoeuvrings: This stage precedes actual navigation towards the destination, typically taking place in a harbour environment. The ship executes slow maneuvers to orient towards its destination: $u = v = r < ϵ$ .
Sailing at cruise speed: At this stage, the vessel is aligned towards the destination and navigates at cruise speed. The surge velocity dominates, and the heading change is relatively small compared to it: $u = u_{0}; v \approx 0; r < ϵ$ .

To accurately represent these different navigation scenarios mathematically, we propose the following model: (15) $\begin{aligned} M \dot{ν} + C (ν) ν + D ν = τ u \approx v \approx r \end{aligned}$ (15) (16) $\begin{aligned} {\begin{cases} u = \sqrt{u^{2} + v^{2}} \\ T \ddot{ψ} + \dot{ψ} = Kδ \end{cases} u ≫ v ≫ r \end{aligned}$ (16) Equation (Equation16(16) $\begin{aligned} {\begin{cases} u = \sqrt{u^{2} + v^{2}} \\ T \ddot{ψ} + \dot{ψ} = Kδ \end{cases} u ≫ v ≫ r \end{aligned}$ (16) ), also known as the Nomoto Model (Nomoto Citation1967), represents a yaw subsystem derived from the manoeuvring model (Equation14(14) $M \dot{ν} + C (ν) ν + D ν = τ .$ (14) ) by selecting the yaw rate r as the output. This model illustrates the relationship between a vessel's heading and the propeller angle when the vessel sails at a constant surge speed $u_{0}$ . Given that it's a linear single-input-single-output (SISO) model, it lends itself to a frequency response analytical PID implementation (Aström and Murray Citation2004).

Figure portrays a standard negative feedback control architecture. The transfer function of the controller is represented by C, while P denotes the vessel dynamic model. The open-loop transfer function is expressed as $L = C \times P$ , and the closed-loop correlation between the reference input r and the output y is $W = \frac{L}{1 + L}$ . The approach in this paper aims to define C such that: (17) $W = \frac{L}{1 + L} = \frac{K_{dc} ω_{n}^{2}}{s^{2} + 2 ξ ω_{n} s + ω_{n}^{2}}$ (17) where the system's settling time $T_{s}$ , rise time $T_{r}$ and control damping ξ can be controlled using the following formula (Doyle et al. Citation1990): (18) $T_{s} \approx \frac{4.6 ξ}{ω_{n}}; T_{r} \approx \frac{1.8}{ω_{n}}; ξ = \frac{- \ln (M_{p})}{\sqrt{π^{2} + l n^{2} (M_{p})}}$ (18) The primary objective of the control design is to achieve:

Zero steady-state error between r and y, implying that the ship successfully reaches the destination coordinates as specified by the Guidance strategy.
Zero overshoot $M_{p}$ when following the heading reference.
Settling time $T_{s}$ and Rise time $T_{r}$ within the limits of what's feasible, considering the propeller power capabilities. In the particular case of this study, the control law is designed to satisfy: $T_{s} < 3$ sec; $T_{r} < 4$ sec, achievable by the propulsion of the Tito Neri model scale vessel.

Figure 5. Feedback loop generalised block-scheme.

We also explore Model Predictive Control (MPC) in this paper. Here, MPC is grounded on the iterative optimisation of a mathematical model of a vessel over a finite time horizon. The present state of the plant is sampled at time t, and a cost-minimising control strategy is computed using a numerical minimisation technique. This is assessed over a fixed future time horizon, $[t, t + T]$ . The predictive nature of MPC enables the control algorithm to find an analytical solution, introducing a beneficial feature to the overall Control architecture: the ability to define a trade-off between plant output and the required control effort.

The model in Equation (Equation14(14) $M \dot{ν} + C (ν) ν + D ν = τ .$ (14) ) needs to be modified, and the overall MPC problem can be formulated as follows: (19) ${\begin{cases} \dot{ν} = (\begin{matrix} \frac{C (ν)}{M} + \frac{D}{M} \end{matrix}) ν + Bτ \\ y = Cν \end{cases}$ (19) (20) $\begin{aligned} J = min & (R_{s} - Y)^{T} (R_{s} - Y) + Δ τ^{T} \bar{R} Δτ \\ s . t . & τ_{min} \leq τ \leq τ_{max} \\ {\dot{τ}}_{min} \leq \dot{τ} \leq {\dot{τ}}_{max} \end{aligned}$ (20) In the above equations, $R_{s}$ is the data vector comprising the system set-point values, i.e. the state values the ship system aims to achieve. $\bar{R}$ is a diagonal matrix represented as $\bar{R} = r_{w} I_{N_{c} \times N_{c}}$ $(r_{w} \geq 0)$ , where $r_{w}$ serves as a tuning parameter for the desired closed-loop performance. Furthermore, Y is the vector containing the predicted output variables over the prediction horizon, and $Δτ$ is the vector containing the future control trajectory as computed by the MPC solver algorithm. (21) $\begin{aligned} Y = [y (k_{i} + 1 | k_{i}), y (k_{i} + 2 | k_{i}), \dots, y (k_{i} + N_{p} | k_{i})]^{T} \end{aligned}$ (21) (22) $\begin{aligned} Δτ = [Δτ (k_{i}), Δτ (k_{i} + 1), \dots, Δτ (k_{i} + Nc - 1)] \end{aligned}$ (22) The optimisation problem in Equation (Equation20(20) $\begin{aligned} J = min & (R_{s} - Y)^{T} (R_{s} - Y) + Δ τ^{T} \bar{R} Δτ \\ s . t . & τ_{min} \leq τ \leq τ_{max} \\ {\dot{τ}}_{min} \leq \dot{τ} \leq {\dot{τ}}_{max} \end{aligned}$ (20) ), which minimises the error between the vessel's current state and the reference provided by the Guidance strategy, will be solved using commercially available software. The structure of the algorithm is outlined in the pseudo-code Algorithm 1.

While the solution is derived using commercial software, the pseudo code presented is used for illustrating the approach in the paper. It specifically highlights when certain operations should be carried out in the algorithm's sequence. Crucially, it details the specific types of inputs required by the MPC control strategy at different stages.

3. Results

3.1. Guidance strategy

The goal of the guidance strategy is to provide a reference route to an autonomous surface vessel in order to reach the destination coordinates avoiding all the obstacles in the environment. The starting point is at location (7,4) and destination at (22,175) in a Cartesian x, y plane. Without loss of generality, all the coordinates are expressed in decimetres in this section, due to the size of the model scale vessel. Three obstacles are present in the environment and the guidance strategy calculates the path between the initial and the final way-points as illustrated in Figure . The safety distance factor M to formulate constraints (Equation9(9) $\begin{aligned} [\begin{matrix} 1 & 0 & - M & 0 & 0 & 0 \\ - 1 & 0 & 0 & - M & 0 & 0 \\ 0 & 1 & 0 & 0 & - M & 0 \\ 0 & - 1 & 0 & 0 & 0 & - M \\ 0 & 0 & 1 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} Y_{sol} (t) \\ X_{sol} (t) \\ i_{1} \\ i_{2} \\ i_{3} \\ i_{4} \end{matrix}] \\ \leq [\begin{matrix} {\tilde{Y}}_{obs}^{min} (t) - Y_{ref} (t) \\ - {\tilde{Y}}_{obs}^{max} (t) + Y_{ref} (t) \\ {\tilde{X}}_{obs}^{min} (t) - X_{ref} (t) \\ - {\tilde{X}}_{obs}^{max} (t) + X_{ref} (t) \\ 3 \end{matrix}] \end{aligned}$ (9) ) is chosen as two times the ship length ( $2 L_{ship}$ ) and it is represented as a black area around the obstacles.

Figure 6. Global MILP trajectory planning.

As shown in the plot, the algorithm provides a pathway whilst minimising the deviation from the obstacle free areas. In this test the obstacles are visible within the prediction horizon between the initial and final position, demonstrating how the MILP algorithm approaches a case of static obstacle avoidance. This is an example of a global trajectory planner, where the optimisation is done over the entire path.

The computational time of the global path planner was measured and equal to $T_{computation} \approx 12.3 s$ .

3.2. Navigation system

The performance of the perception system is evaluated through feeding a validation dataset into the multi-modal network and reviewing the resultant output, as well as subjecting some focus on individual tasks to identify sub-task weaknesses. The validation dataset is comprised of 100 data points gathered within the indoor tank environment, with each data point providing a colour frame and a depth frame to be inputted to the network for inference.

The sub-figures presented in Figure present an example of the visual results from the major stages throughout the multi-modal network. Each sub-figure is referenced to in the labels within Figure . An output from inference on the VesselNet CNN can be seen in Figure (a) where the inputted colour frame has been annotated with the detected object bounds and classification. The processed depth data can be visualised in Figure (b) as a colour-mapped image, with the pixels aligned to the colour frame and the colour-map range encoding the depth range. The fusion of the first stages creates a birds-eye-view plot such as that presented in Figure (c). The resultant obstacle bounds and angle from BearingNet CNN are then used to create a scaled occupancy map, as presented in Figure (d).

Figure 7. Perception system results. (a) Output from VesselNet neural network, (b) Colour-map visualisation of processed depth data, (c) Birds-eye-view plot of fused data and (d) Planar occupancy map – Grid size: 1m, coordinate origin marked by white dot..

The VesselNet CNN demonstrated strong performance in detecting the vessel and distinguishing its class. Quantitative evaluation using the recall performance indicator defined in Equation (Equation23(23) $Recall = \frac{correct detections}{all ground truths} = 98 %$ (23) ) yields a result of 0.98, meaning that the VesselNet CNN correctly detected and classified the vessels 98% of the time within the validation dataset. As the bounding box from this network is only used to attain a ROI for ongoing tasks, the bounding box precision is not of critical importance and is therefore not quantitatively evaluated in this work however it is observed to be more than satisfactory to fulfil its role. Furthermore in a collision avoidance application, vigilance leans in favour of heightened recall at the sacrifice of precision.

The extraction and processing of depth data to create the point-cloud BEV plot also proved itself to be robust. The hardware limitation of the stereovision device does however become clear beyond approximately eight metres with the resultant point cloud being of low resolution, producing a sparse BEV plot. The multimodal architecture inherently leads to overall system performance being dictated by the weakest link in the chain, which in this case proved to be the BearingNet. With a recall of 0.59 at an angular resolution of $\frac{π}{8}$ , the BearingNet does not maintain the performance level of its predecessors yet does still demonstrate promise in this proof-of-concept application. (23) $Recall = \frac{correct detections}{all ground truths} = 98 %$ (23) This method allows to determine both the position and the heading of obstacles relative to the controlled vessel frame of reference. For the guidance algorithm, once known the position of the controlled vessel, it's straightforward to calculate the global position of any obstacles nearby using basic geometry and a rotation matrix.

3.3. Control Design

Results are obtained using one of the model scale vessel available at ResearchLab Autonomous Shipping (RAS) of the Delft University of Technology, a Tugboat vessel with a scale factor 1:30 called Tito Neri (). A Tito Neri has three propeller, two stern $360^{\circ}$ azimuth thrusters and a tunnel bow thruster. It is a fully actuated ship system with respect to the Cartesian plane since, based on its thrust allocation, it is possible to provide thrust in all of the 3DoF. The ship is equipped with the hardware and sensors summarised in Table .

Figure 8. Tito Neri's thrust allocation.

Table 1. Tiro Neri's hardware ad sensor.

Download CSV Display Table

The first step is to identify the Nomoto Model parameters in (Equation16(16) $\begin{aligned} {\begin{cases} u = \sqrt{u^{2} + v^{2}} \\ T \ddot{ψ} + \dot{ψ} = Kδ \end{cases} u ≫ v ≫ r \end{aligned}$ (16) ). This is achieved with experimental measurement campaigns using the High Precision Positioning Feedback Camera system in the Towing tank lab facility of the Maritime and Transport Technology Department at Delft University of Technology. The tests were performed reaching steady surge velocity on the Tito Neri model scale ship and injecting at time $t_{0}$ a deviation of $45^{\circ}$ angle on the stern propellers. The experiments were repeated 10 times and the separate results were compared and averaged in order to reduce the amount of uncertainties during the test phase.

Figure show the evolution over time of the heading rotational velocity r measured by the camera system and the simulated output of the mathematical Nomoto model response were extracted using nonlinear least-squares fit method.

Figure 9. Estimated nomoto model using nonlinear least-square fit.

The parameters where then calculated from the plots: K = 3.55, T = 1.67 leading to the formulation of the Nomoto model (24) $\begin{aligned} P = \frac{ψ}{δ} (s) = \frac{3.55}{s (1 + 1.67 s)} \end{aligned}$ (24) Based on the transfer function obtained and recalling (Equation17(17) $W = \frac{L}{1 + L} = \frac{K_{dc} ω_{n}^{2}}{s^{2} + 2 ξ ω_{n} s + ω_{n}^{2}}$ (17) ), the control law C is calculated as

(25) $\begin{aligned} C = \frac{K_{c}}{s + a} \end{aligned}$ (25) As depicted in Figure , the controlled Tito Neri vessel follows the reference path stipulated by the Guidance algorithm. The dynamics remain stable as the ship avoids environmental obstacles, despite noticeable path deviations, particularly around curves. Such transient deviations can be attributed to the control law's focus on heading dynamics, while not considering the physical correlations between sway and yaw motion. The model predictive control (MPC) design formulation, a comprehensive model-based control strategy, accounts for multi-input-multi-output dynamics by incorporating a full 3DoF ship model. Recalling the expression (Equation14(14) $M \dot{ν} + C (ν) ν + D ν = τ .$ (14) ), in the following the expanded matrices: (26) $\begin{aligned} M_{RB} & = [\begin{matrix} m & 0 & 0 \\ 0 & m & m x_{g} \\ 0 & m x_{g} & I_{z} \end{matrix}], M_{A} = [\begin{matrix} - X_{\dot{u}} & 0 & 0 \\ 0 & - Y_{\dot{v}} & - Y_{\dot{r}} \\ 0 & - Y_{\dot{r}} & - N_{\dot{r}} \end{matrix}], \end{aligned}$ (26) (27) $\begin{aligned} C_{RB} & = [\begin{matrix} 0 & 0 & - m (x_{g} r + v) \\ 0 & 0 & mu \\ m (x_{g} r + v) & - mu & 0 \end{matrix}], \end{aligned}$ (27) (28) $\begin{aligned} C_{RB} & = [\begin{matrix} 0 & 0 & Y_{\dot{v}} + (Y_{\dot{r}} + N_{\dot{v}}) r \\ 0 & 0 & - X_{\dot{u}} \\ - Y_{\dot{v}} v - (Y_{\dot{r} + N_{\dot{v}}}) r & - X_{\dot{u}} u & 0 \end{matrix}], \end{aligned}$ (28) (29) $\begin{aligned} D & = [\begin{matrix} - X_{u} & 0 & 0 \\ 0 & - Y_{v} & - Y_{r} \\ 0 & - N_{v} & - N_{r} \end{matrix}], \end{aligned}$ (29) The values of the ship model parameter are summarised in Table .

Figure 10. xy Cartesian plot using PID controller.

Table 2. Tito Neri 3DoF model parameters.

Display Table

As illustrated in Figure , this consideration of the correlated dynamics between lateral and angular motion enables the control algorithm to guide the ship along the reference route with zero path deviation. However, it is worth noting that the control effort demanded by the MPC controller is considerable, especially for the actuators. Figure elucidates how the control law requires a force that continually changes at a frequency of approximately 0.2 Hz, resulting in a substantial load on the actuators. This is primarily to compensate for the sway drift during navigation, which, however, might induce aggressive actuator dynamics, potentially leading to mechanical failure. Key performance indicators (KPIs) of the control strategies analysed in this study are concisely summarised in Tables and . Figures – document the results of the PID-based control approach. Figure specifically displays the ship's heading control performance, demonstrating no overshoot and a rise time of 2.1 seconds. This performance, along with a steady-state settling time of 3 seconds, reveals the adaptability of the system across diverse avoidance maneuvers. Figure presents the control inputs directed towards the ship's azimuth propeller system, demonstrating how the frequency response-based control law can generate feasible control inputs for the motors driving the azimuth rotation mechanics. Figures and elaborate on the finer aspects of the low-level control on the ship's propulsion system. The propellers maintain a constant speed on both the port and starboard sides throughout the navigation. With a rise time of 1.6 seconds and a settling time of 3.4 seconds, the performance of the propeller speed control reemphasizes the compatibility of the control law with the ship's hardware. In conclusion, the PID-based approach coupled with the frequency response-based control law produces an efficient navigation control system. Its robustness, exhibited in both heading control and actuator-level operations, suggests potential suitability for real-world maritime navigation applications, thereby aiding the enhancement of a vessel's autonomous capabilities.

Figure 11. xy Cartesian plot using MPC controller.

Figure 12. MPC – control forces requested.

Figure 13. PID – control forces requested.

Figure 14. Ship's heading evolution.

Figure 15. Azimuth propeller control inputs.

Figure 16. Controlled rpms, portside propeller.

Figure 17. Controlled rpms, starboard propeller.

Table 3. KPIs control strategies.

Download CSV Display Table

Table 4. KPIs GNC scheme adopted in the experimental setup.

Display Table

4. Conclusions and further research

A complete GNC framework has been presented in this paper, enhancing the importance of how these three modules are dependent upon one another to achieve autonomous navigation for a surface vessel.

In this work the sub-systems have only been evaluated independently however the authors intend to evaluate the complete GNC framework under experimental collision avoidance scenarios in future work. The MILP guidance algorithm as presented serves as a proof on concept for a global trajectory planning algorithm. It generates an alternative, minimally deviated, evasive path for the vessel whilst respecting kinematic constraints. In the context of real ship dynamics, especially for large vessels, 12 seconds planning time may be adequate given the slower responses time and longer distances required for maneuvers. However, it is important to emphasise that the primary focus was on establishing a proof of concept, and the presented complete GNC scheme was chosen to balance computational efficiency with realistic and feasible model-scale operational scenarios. Faster hardware can theoretically minimise the computational load even further, for application where fast moving objects are in the environment as well. Comparatively, existing maritime navigation systems, which often rely on a combination of radar and manual inputs, operate on a similar timescale for trajectory planning.

Additionally, the guidance approach focuses on immediate spatial positioning rather than a predictive, kinematic based, modelling using the obstacle vessels headings. Incorporating future obstacle position prediction could, however, represent an interesting direction for future work, where the dynamics of moving obstacles could be integrated into the guidance system for more complex navigational scenario taking into account their heading and the possibility to estimate the velocity of the surrounding objects as well.

The perception approach investigated in this work demonstrates the promise of a multi-modal network architecture in achieving situational awareness for obstacle avoidance activities. In the research the camera sensor was primarily chosen for its accessibility and compatibility with the model scale used, to provide a proof of concept. The D435i, with its active IR projections, is not merely a stereo camera and is less suited for large-scale sea applications due to its range and sensitivity to environmental conditions. In future work alternative for true development can be considered, for example a mix of specialised marine-grade LiDAR system with more sophisticated camera technology. Nonetheless, the underlying principles of the Computer Vision algorithm demonstrated here would largely remain applicable. The Faster R-CNN based VesselNet proves itself to be a suitable candidate for detecting vessels and distinguishing their classification from colour image frames. Depth-data from a stereovision sensor can provide sufficient point-cloud resolution to detect both vessel orientation and distance, albeit within the range capacity of the device's hardware. Although the overall performance of the BearingNet limited the performance of the network, this sub-optimal performance could be improved in future work through additional training data and fine tuning of the network model. Even with a limited overall system performance, the slow-moving nature of vessels and the capacity for multiple frames to be processed each second could still permit application to autonomous vessels. Additionally, it is also noteworthy to mention that in our approach, setting missing depth values to the maximum of their surroundings represents an optimistic scenario. For applications in real-world environments, a more robust method for filling these gaps could be explored. This could include statistical techniques or machine learning-based interpolation strategies, which potentially offer a more precise estimation of missing data points in a sparse depth field. Moreover, considering the notably high recall rate of 98% achieved by VesselNet, primarily attributed to the training on model-scale ships similar to those used in our test scenarios, it becomes important to address the potential limitations of our current dataset. The application of VesselNet in real-world, densely trafficked maritime environments may reveal performance degradation if the training dataset lacks diversity and fails to integrate a comprehensive array of ship types and conditions comparable with those encountered in actual navigation settings. This underlines the necessity of expanding the dataset used in this work to include a broader spectrum of real-world maritime images, enhancing the neural network capability to accurately detect and classify vessels under a wide range of scenarios. Furthermore, the impact of potentially noisier detected object positions, as well as the occurrence of false negatives and positives, demands in-depth analysis because of their substantial influence on the path planning executed by the Guidance system. While the inherent control stability of ship system in a path following context may mitigate the risks associated with path inconsistencies, the increased control effort demanded from actuators in response to detection inaccuracies could potentially compromise vessel integrity. Therefore, the pursuit of further research to enhance VesselNet by training with real ship navigation data becomes essential. This effort is crucial for guaranteeing the robustness and dependability of our proposed guidance and control strategy among the diverse and challenging dynamics of marine environments.

In conclusion, this paper has discussed two model-based control design strategies – a second-order frequency response correlation-based controller and a Model Predictive Control (MPC). Each of these controllers demonstrated the potential to contribute to autonomous navigation when integrated with the previously described Guidance and Navigation systems. Although the MPC offered a higher precision in path following, the second-order frequency response correlation-based controller emerged as the preferable choice for practical implementation. This decision is justified by its robustness, superior computational efficiency, and lesser demand on the actuator side, hence ensuring the preservation of the ship's mechanical integrity. In this study, it was found that the impulses sent to the actuator by the Model Predictive Control algorithm were primarily due to uncertainties in the maneuvering model parameters Table . Essentially, if the ship anticipates that a certain force will move it to a predicted position, but subsequently finds itself in a different position in the next time frame, the MPC algorithm attempts to correct this discrepancy by counteracting the previous effort.

However, this conclusion does not undermine the potential benefits of the MPC and methods to mitigate the illustrated control spikes should be further investigates in future research also to explore ways of reducing the computational demands of more advanced and theoretically superior control strategies like the Non-linear Model Predictive Control and back-stepping. Such advancements are crucial, especially considering that these control systems must work in tandem with computationally intensive guidance and navigation methods.

Moreover, as our study was primarily based on the Tito Neri model vessel, future research can also investigate the applicability and performance of these control strategies on different vessel types. Furthermore, the exploration of control design methodologies that take into account other dynamics, beyond the heading dynamics considered in this study, could offer a more comprehensive approach to autonomous navigation.

Lastly, further consideration of the potential effects of environmental conditions on the performance of these control systems, and how they can be effectively accounted for in the control design process, is also warranted. While the core of this research primarily focussed on the implementation aspects of a full GNC system, further simulations can play an important role in refining the proposed methods and validating conclusions. Going forward, the plan is to expand simulation scenarios to include diverse maritime environments and dynamic obstacle behaviours, integrating real-world data to bridge the gap between simulation and practical application. This will not only test the robustness and adaptability of our system under varied conditions but also help us understand its scalability and actual efficiency in crowd maritime environments. This will also ensure the adaptability and reliability of autonomous maritime navigation systems in real-world maritime conditions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research presented in this paper was supported by the Researchlab Autonomous Shipping (RAS) and funded by the European Union and the European Regional Development Fund, as part of the Interreg North Sea Region project AVATAR.

Notes on contributors

V. Garofano

Ir. V. Garofano is Research/Education support Engineer and Coordinator of the technical activities in the ResearchLab Autonomous Shipping (RAS) at the Department of Maritime & Transport Technology, Delft University of Technology. His research and engineering work include implementation of control algorithms with focus on autonomous inland vessels navigation, software-hardware integration and design of robust control strategies.

M. Hepworth

Ir. M Hepworth is Research Engineer at the ResearchLab Autonomous Shipping at Delft University of Technology. He is specialised in the application of computer vision and artificial intelligence to situational awareness tasks in the field of maritime autonomy.

R. Shahin

R. Shahin is an MSc student of the Aerospace Faculty at Delft University of Technology and formerly a Research Assistant at the ResearchLab Autonomous Shipping (RAS) working on state of the art Guidance Algorithm.

Y. Pang

Dr. ir. Y. Pang is assistant professor in the department of Maritime and Transport Technology, Delft University of Technology. His research mainly focuses on the intelligent control for transport technology and logistics processes including the fields of autonomous shipping applications.

R. R. Negenborn

Prof. Dr. R. R. Negenborn is full professor in Multi-Machine Operations & Logistics at Delft University of Technology. He is head of the Section Transport Engineering & Logistics and leads the Researchlab Autonomous Shipping (RAS). His research interests include automatic control and coordination of transport technology, with a focus on smart (autonomous) shipping and smart logistics applications. He has over 300 peer reviewed publications and leads NWO, EU and industry funded research, such as the Horizon 2020 program NOVIMOVE.

References

Aström KJ, Murray RM. 2004. Feedback systems: an introduction for scientists and engineers. Princeton University Press.
Google Scholar
Beltrán J, Guindel C, Moreno FM, Cruzado D, Garcia F. 2018, Nov. Birdnet: a 3D object detection framework from lidar information. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, Hawaii, USA, p. 3517–3523. IEEE. https://doi.org/10.48550/arXiv.1805.01195
Google Scholar
Choy CB, Stark M, Corbett-Davies S, Savarese S. 2015. Enriching object detection with 2d–3d registration and continuous viewpoint estimation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, p. 2512–2520. IEEE.
Google Scholar
Coleman D. 2024. Obstacle avoidance and path planning using mixed integer programming. Department of Computer Science, University of Colorado.
Google Scholar
Corić M, Mandzuka S, Gudelj A, Lušić Z. 2021 May. Quantitative ship collision frequency estimation models: a review. J Marine Sci Eng. 9:533. doi: 10.3390/jmse9050533
Web of Science ®Google Scholar
Deraj R, Kumar RS, Alam MS, Somayajula A. 2023. Deep reinforcement learning based controller for ship navigation. Ocean Eng. 273:113937. doi: 10.1016/j.oceaneng.2023.113937
Web of Science ®Google Scholar
Dey C, Sen SK. 2020. Industrial automation technologies. Boca Raton: CRC Press. doi:10.1201/9780429299346.
Google Scholar
Doyle J, Francis B, Tannenbaum A. 1990. Feedback control theory. London: Macmillan Publishing Group.
Google Scholar
Fossen TI. 2011. Handbook of Marine Craft Hydrodynamics and Motion Control. John Wiley & Sons, Ltd.
Google Scholar
Garofano V, Hepworth M, Shahin R. 2022. Obstacle avoidance and trajectory optimization for an autonomous vessel utilizing milp path planning, computer vision based perception and feedback control. Proceedings of the International Ship Control Systems Symposium. Vol. 16. Conference Proceedings of iSCSS, International Ship Control Systems Symposium, Delft, 8–10 November 2022.
Google Scholar
Haseltalab A, Garofano V, van Pampus M, Negenborn R. 2020. Model predictive trajectory tracking control and thrust allocation for autonomous vessels. IFAC-PapersOnLine. 53(2):14532–14538. doi: 10.1016/j.ifacol.2020.12.1457 21st IFAC World Congress 2020; Conference date: 12-07-2020 Through 17-07-2020.
Google Scholar
Hepworth M, Garofano V, Reppa V, Negenborn RR. 2021. Collision avoidance for autonomous inland vessels using stereovision. https://repository.tudelft.nl/islandora/object/uuid:5e00f536-fa99-4116-8f87-cfb01a525fcb?collection=education.
Google Scholar
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al. 2016, Nov. Speed/accuracy trade-offs for modern convolutional object detectors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, p. 7310–7311. IEEE.
Google Scholar
Kato S, Takeuchi E, Ishiguro Y, Ninomiya Y, Takeda K, Hamada T. 2015. An open approach to autonomous vehicles. IEEE Micro. 35(6):60–68. doi: 10.1109/MM.2015.133
Web of Science ®Google Scholar
Khalid Z, Mohamed EA, Abdenbi M. 2013. Stereo vision-based road obstacles detection. 2013 8th International Conference on Intelligent Systems: Theories and Applications, SITA 2013. p. 1–6. IEEE and British Council, Rabat, Marocco.
Google Scholar
Kim K, Hong S, Choi B, Kim E. 2018 Jun. Probabilistic ship detection and classification using deep learning. Appl Sci. 8(6):936. doi: 10.3390/app8060936
Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B. 2017. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. IEEE.
Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B. 2020. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 42(2):318–327. doi: 10.1109/TPAMI.34
PubMed Web of Science ®Google Scholar
Nomoto K. 1967. A simplified analysis on ship motion under manoeuvre and proposed steering. Osaka Univeristy, Japan. http://resolver.tudelft.nl/uuid:5c3dd568-013d-4d85-8dfc-2579d84cede8
Google Scholar
Parkes A, Savasta T, Sobey A, Hudson D. 2022. Power prediction for a vessel without recorded data using data fusion from a fleet of vessels. Expert Syst Appl. 187:115971. doi: 10.1016/j.eswa.2021.115971
Web of Science ®Google Scholar
Parkes A, Sobey A, Hudson D. 2018. Physics-based shaft power prediction for large merchant ships using neural networks. Ocean Eng. 166:92–104. doi: 10.1016/j.oceaneng.2018.07.060
Web of Science ®Google Scholar
Richards A, How JP. 2002. Aircraft trajectory planning with collision avoidance using mixed integer linear programming. Proceedings of the 2002 American Control Conference, Anchorage, AK, USA. IEEE
Google Scholar
Sánchez-Beaskoetxea J, Basterretxea-Iribar I, Sotés I, Machado M. 2021 Jan. Human error in marine accidents: is the crew normally to blame? Maritime Transp Res. 2:100016. doi: 10.1016/j.martra.2021.100016
Google Scholar
Schouwenaars T, De Moor B, Feron E, How J. 2001. Mixed integer programming for multi-vehicle path planning. 2001 European Control Conference (ECC), Porto. IEEE.
Google Scholar
Sciavicco L, Siciliano B. 1998. Modeling and control of robot manipulators, Lorenzo Sciavicco and Bruno Siciliano. Journal of Intelligent and Robotic Systems. 21(1):99–100. doi: 10.1023/A:1007979428654
Google Scholar
Sivaraj S, Dubey A, Rajendran S. 2023. On the performance of different deep reinforcement learning based controllers for the path-following of a ship. Ocean Eng. 286:115607. doi: 10.1016/j.oceaneng.2023.115607
Web of Science ®Google Scholar
Vielma JP. 2015. Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57. doi: 10.1137/130915303
Web of Science ®Google Scholar
Wang W, Gheneti B, Mateos LA, Duarte F, Ratti C, Rus D. 2019. Roboat: an autonomous surface vehicle for urban waterways. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, p. 6340–6347. IEEE.
Google Scholar
Ye J, Li C, Wen W, Zhou R, Reppa V. 2023. Deep learning in maritime autonomous surface ships: current development and challenges. J Marine Sci Appl. 22:584–601. doi: 10.1007/s11804-023-00367-1
Web of Science ®Google Scholar

Obstacle avoidance and trajectory optimisation for an autonomous vessel utilising MILP path planning, computer vision based perception and feedback control

Abstract

1. Introduction