Publication Cover
Canadian Journal of Remote Sensing
Journal canadien de télédétection
Volume 50, 2024 - Issue 1
359
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Synthetic Images for Georeferencing Camera Images in Mobile Mapping Point-clouds

Images synthétiques pour le géoréférencement d’images de caméras dans des nuages de points de cartographie mobile

, &
Article: 2300328 | Received 04 Sep 2023, Accepted 23 Dec 2023, Published online: 16 Jan 2024

Abstract

Accurate three-dimensional mapping and digital twinning provides a powerful tool for effective maintenance of civil infrastructure and supports efficient future planning of new developments. Three-dimensional mapping can be efficiently performed with a Mobile Mapping System (MMS) that records geospatial data from platform-mounted sensors. However, it is expensive to continuously update datasets by re-capturing with MMS. This paper outlines a novel method allowing camera-only approaches for updating and change detection. It resolves key issues with inherent resolution differences between MMS laser scanner point-clouds and camera images. An intermediary is used to register two disparate datasets. This novel approach to synthetic camera images (SCIs) bridges the differences between MMS point-clouds and camera images and aid in coarse registration of camera images to an outdoor MMS point-cloud. SCI coarse registration precision is maximized by generating surfaces, interpolating intensity values, and reducing noise with a median filter. Landmark features coarsely register the camera image to the MMS point-cloud. The coarse registration is most precise when the whole scene is captured either from the same location as the SCI or further from the scene. Landmarks precisely detect scenes when changes are less than 20%, and foliage does not exceed 20% of the camera image.

RÉSUMÉ

La cartographie tridimensionnelle précise et le jumelage numérique constituent des outils puissants pour l’entretien efficace des infrastructures civiles et favorisent une planification efficiente de nouveaux développements. La cartographie tridimensionnelle peut être réalisée efficacement à l’aide d’un système de cartographie mobile (SCM) qui enregistre les données géospatiales à partir de capteurs montés sur une plate-forme. Cependant, il est coûteux de mettre à jour en permanence les jeux de données en les enregistrant avec un SCM. Cet article décrit une nouvelle méthode permettant d’utiliser uniquement des caméras pour la mise à jour et la détection des changements. Il résout les principaux problèmes liés aux différences de résolution inhérentes entre les nuages de points des scanners laser SCM et les images des caméras. Un intermédiaire est utilisé pour enregistrer deux jeux de données disparates. Cette nouvelle approche, des images de caméra synthétique (ICS), comble les différences entre les nuages de points SCM et les images de caméra et facilite l’enregistrement grossier des images de caméra sur un nuage de points SCM extérieur. La précision du repérage grossier ICS est maximisée en générant des surfaces, en interpolant les valeurs d’intensité et en réduisant le bruit à l’aide d’un filtre médian. Des fonctions de repère enregistrent grossièrement l’image de la caméra dans le nuage de points SCM. Le repérage grossier est plus précis lorsque l’ensemble de la scène est capturé soit à partir du même emplacement que les ICS, soit plus loin dans la scène. Les points de repère détectent avec précision les scènes lorsque les changements sont inférieurs à 20% et que le feuillage ne dépasse pas 20% de l’image.

This article is part of the following collections:
Technological Advancements in Urban Remote Sensing

Introduction

Mapping infrastructure is important for the economy and public safety (Oliver et al. Citation2018). Mapping can be efficiently performed with a mobile mapping system (MMS) that records geographical data from platform-mounted sensors. The platform is moved along a route to observe as-built infrastructure in urban corridors or capture the landform for analysis. MMS data is used to generate a 3D model for infrastructure planning, environmental monitoring, emergency response planning, and resource management. The use of accurate and up-to-date, as-built 3D models support infrastructure expansion and upgrading like roads, railways, powerlines, bridges, and tunnels. Discussion of a 3D city model provides an example.

A 3D city model with the shapes of buildings and other existing objects can be reconstructed with highly detailed spatial information. It is becoming increasingly common for industry to use detailed point clouds to support engineering design. For example, this paper describes a 3D model generated from an MMS is used to plan the expansion to the light rail transit system in the City of Calgary northeast. The model is used to identify infrastructure impacted by the future rail line and any encroachment on private property. These models, and subsequent infrastructure upgrades, are also key needs for autonomous vehicles and smart cities (Lemmens Citation2018; Oliver et al. Citation2018; Shahrour Citation2018).

Sometimes spatial mapping data and related 3D models are incomplete due to occlusions or because datasets become outdated in dynamic environments. Occlusions occur when an object obstructs another object. For example, a tree obstructing the view of a house wall as show at .

Figure 1. Causes of incomplete data: (a) occlusions and (b) construction altering infrastructure.

Figure 1. Causes of incomplete data: (a) occlusions and (b) construction altering infrastructure.

MMS data is often captured in dynamic environments like a city where changes frequently occur due to construction or temporary features such as traffic and pedestrians as seen at . The current method to resolve occlusions and changes is to re-observe the occluded or changed portion of the map with a full set of observation or multiple passes of the MMS. Updating map data is resource intensive and costly because it requires subsequent MMS passes since there is no straightforward updating method without recapturing the whole dataset or a subset of the data capture especially when updating 3D legacy models.

An alternate solution is to capture newer 2D camera images to fill the missing data, or map gaps, caused by occlusions or changes. Lowry et al. (Citation2016) suggests that updating or complementing maps with image sequences may fill map gaps by recognizing the changes from the image sequence, but a single image or image pair is currently unable to fill a map gap. Filling map gaps can be accomplished by registering the camera images to the MMS point-cloud. The pose of the camera image facilitates registration to the point-cloud. However, not all cameras have sensors with pose information from global navigation satellite systems (GNSS). Limited GNSS line-of-sight in urban corridors or indoor spaces makes it difficult to acquire a precise pose from sensors. In addition, georeferencing camera images to an MMS point-cloud can be used in other applications like forensic scene documentation, vehicle navigation, or construction engineering.

Registration of subsequent camera images is challenging because of unknown relative orientation parameters between the MMS and camera. This problem is described in where the camera image has unknown translation and orientation parameters represented by the red vectors. Vector (CiC) represents the translation parameters. The orientation parameters (roll (ω), pitch (ϕ), and yaw (κ)) make up the relative orientation rotation matrix from camera image to the MMS frame (RiGRGMMS=RiMMS). Therefore, it is necessary to independently determine the camera’s relative orientation to the MMS point-cloud. Due to the often-georeferenced nature of the MMS point-cloud, the relative orientation is defined by the external orientation parameters (EOPs). EOPs include position (XMMS, YMMS, ZMMS) and orientation (ω, ϕ, and κ), in the global frame. However, this method can be used when the point-cloud is not georeferenced then the relative orientation parameters are position and orientation in the mapping frame.

Figure 2. Pose of MMS and camera.

Figure 2. Pose of MMS and camera.

The camera cannot be finely registered to the point-cloud without coarse registration, or an initial estimation of the pose. In camera-to-camera terms this is known as place recognition. To the authors’ best knowledge, there is no discussion in the literature that applies place recognition between the two disparate datasets because there are no common primitives between point-clouds and camera imagery because of their spectral and spatial resolution differences.

This paper poses the research question, is it possible to coarsely register subsequent imagery to existing 3D MMS models without prior image pose information? The paper responds by describing: (i) a novel method for registering newer camera images to the MMS point-cloud either captured by a non-technical user or from crowed sourced images that do not contain pose information; (ii) a new workflow for generating an intermediary between MMS point-clouds and subsequently captured images; and (iii) describes a novel adaptation of existing camera-to-camera feature registration methods. The synthetic camera image (SCI) provides an intermediary to address the spectral and spatial resolution differences between the two datasets.

The paper is set out in three main sections: Literature Review, Methods, and Results and Discussion. The literature review describes the background on methods used for addressing the spectral and spatial resolution differences and identifies the need for a novel SCI generation method for MMS point-clouds. The method section describes the implementation of these methods for a novel synthetic image generation for MMS point-clouds to coarsely register the subsequently captured camera images using the Calgary Greenline dataset. It also describes a novel method for feature matching between camera and synthetic images for coarse registration that makes use of landmark features employed from AlexNet, a linear generic pretrained CNN, for generating invariant feature descriptors from edgeboxes. These methods are used to create the synthetic images and landmark features to minimize resolution differences with camera images and are tested for precise coarse registration.

Literature review

Mismatched spatial and spectral resolutions are challenges for registration between point-clouds and camera images. The research of Ku et al. (Citation2018) and Forkuo and King (Citation2004) examine theses spatial resolution challenges and Wendt and Heipke (Citation2006) examine the spectral resolution challenges for terrestrial laser scanners. Their research has not addressed how these challenges pertain to registration of MMS point-clouds.

Ku et al. (Citation2018) identified spatial resolution where a terrestrial laser scanner beam does not strike a surface as the points in the point-cloud are at infinity and do not appear in the model. For example, the point-cloud image in has no points in the sky, but the camera image shows clouds that can sometimes be identified as features for matching.

Figure 3. Challenges registering between camera images and point-clouds.

Figure 3. Challenges registering between camera images and point-clouds.

The second spatial resolution challenge is that sparse point distribution is insufficient for comparison to the 2D photograph because of gaps between the points (Forkuo and King Citation2004; Ku et al. Citation2018). These gaps are seen on the roof and the large wall in the where black indicates no points between the white intensity pixels. It is also an issue where the outline of the garage is occluded in the camera image when captured from another observation trajectory.

Spectral resolution defines a sensor’s ability to discern electromagnetic spectrum features. Cameras capture visible light, while MMS laser scanners measure narrow-bandwidth laser return energy (Wendt and Heipke Citation2006). Camera pixels convey RGB values, whereas MMS pixels represent NIR return energy (Gonzalez and Woods Citation2008). Camera intensity, derived from RGB, differs from MMS point-cloud returned intensity complicating camera to MMS point-cloud registration.

For fine registration between these two sensors, Forkuo and King (Citation2004) proposed generating a synthetic camera image (SCI) as an intermediate to translate primitives. Their SCI method does not appear to have been pursued but provides a potential intermediate.

Forkuo and King (Citation2004) used image processing and feature registration for addressing the spectral and spatial challenges in registering camera images to high-density terrestrial-scanner point-clouds. Their SCI is generated from a high-density point-cloud captured with a terrestrial laser scanner. Each point from the point-cloud is represented as a pixel in the SCI. The camera image is captured with the same pose as the synthetic camera and Harris corners (Harris and Stephens Citation1988) register camera images to the point-cloud because it ignores the spectral resolution challenges. Their method for registration generates an intermediary SCI with a synthetic camera at the same time and location as the camera image resulting in no scale or temporal changes. However, this method is incomplete, as it does not address variance in scale or pose or the MMS resolution challenges described in . A method is required to support coarse registration between camera images and point-clouds by providing an intermediary.

The following subsections will describe the literature for addressing the spatial and spectral resolution challenges. It is separated into three subsections: (i) Synthetic Image Generation; (ii) Camera Image Processing; and (iii) Coarse Registration Features. Synthetic Image Generation describes the literature for addressing the spatial resolution and synthetic image capture using surface generation, raytracing, and intensity interpolation. Camera Image Processing describes literature for spectral resolution processing for matching with the MMS point-cloud and downsampling for the spatial resolution. Coarse Registration Features examines the literature for feature detection, description and matching in camera-to-camera place recognition or coarse registration.

Synthetic image generation

To direct the literature review it is first necessary to introduce the novel approach to generating SCIs from large mobile mapping point-clouds that resolves the resolution differences in coarse registration of newer camera images to MMS point-clouds. The novel SCI method, described at , involves surface generation, raytracing, interpolation, and image processing. The literature relating to these methods are described in the subsections below.

Figure 4. SCI generation flowchart.

Figure 4. SCI generation flowchart.

Surface generation

Surface generation handles sparsity and occlusion issues within point-clouds. A surface is generated from the point-cloud to provide a digital object between points that removes the distances between points and gives the synthetic camera something to detect. The surface also occludes objects from outside the scene or that should be hidden from the synthetic camera viewpoint. However, any surface generation method must also minimize artifacts and holes in surfaces generated from the MMS point-cloud to align with the camera images.

Poisson, Delaunay, and fast surface reconstruction (fast recon) are ubiquitous surface generation methods that were explored in this research for MMS applications to minimize artifacts (Kazhdan et al. Citation2006; Delaunay Citation1934; Marton et al. Citation2009). Poisson surface generation creates a continuous vector field from the oriented points, finds the closest scalar function gradient that matches the vector field, and extracts the isosurface as seen at (Kazhdan et al. Citation2006). The closed surface is then cropped by adjusting the scalar field based on the point density (Rumpler et al. Citation2013). Cropping the lower-density values removes enclosing surfaces and artifacts from the surface. However, it also leaves gaps in the surface where objects are occluded or the incidence angle was too great as seen at . As a result, the cropping values are chosen manually to minimize the number of artifacts while preventing the appearance of gaps as seen at .

Figure 5. Poisson surface reconstruction: (a) the initial surface reconstruction; (b) cropped surface showing gaps and some artifacts.

Figure 5. Poisson surface reconstruction: (a) the initial surface reconstruction; (b) cropped surface showing gaps and some artifacts.

In contrast, Delaunay projects the points onto the x-y plane. The points are triangulated so no other point is inside the circumcircle for the triangle (Delaunay Citation1934). The mesh structure is then returned to 3D. The result is no need for manual thresholding, but this method is known for producing slivers of triangles creating jagged edges and difficulty with non-uniform point density (Hilton et al. Citation1996). Fast recon is a surface growing algorithm (Marton et al. Citation2009). A k-neighborhood is selected for each point by searching for the points nearest neighbors in a radius r. The radius adapts to the local point density by multiplying the distance of point p to its nearest neighbor (d0) with a user-specified threshold (μ). The neighborhood is projected onto a plane tangential to p and its neighborhood. Non-visible points are pruned. Points are also connected to p and consecutive points by edges to form triangles. These triangles have a maximum angle criterion to describe the characteristics of the holes in the surface. Larger angles cause fewer holes in the model. The triangles also have a minimum angle criterion that helps to minimize slivers and artifacts in the model.

These surface generation methods are tested in the research to find the surface generation method that produces the fewest artifacts and holes in MMS point-clouds. This is done to resolve spatial differences including proper object occlusion and point distribution between the MMS point-cloud and a camera image. The chosen surface generation method is the fast recon method and is described in detail in the results section. The surface is struck by a ray cast through the synthetic camera model. This provides a pixel intensity value for the struck surface. The next section describes the synthetic camera model followed by the interpolation of the intensity value.

Ray tracing & orientation

Raytracing simulates geometric optics by tracing oriented rays through a synthetic camera into object space, which in this paper is the mobile mapping space (Formella and Gill Citation1995). Generally, raytracing involves thousands of intersecting rays from various illumination models. This paper uses it to capture the SCI by projecting the ray from the image plane, through the synthetic camera model, and into the MMS surface space with no additional light sources. Terdiman’s raytracing algorithm then uses a bounding volume tree to build a hierarchy of bounding volumes to speed up collision detection (Terdiman Citation2003; Stich et al. Citation2009). The intensity value of the intersection between the ray and the surface is then stored as the pixel value.

The ray is cast from the optical center (Cj), through the pixel on the image plane at point p. When the ray strikes a surface at point P the intensity value is then stored in the pixel. Rays are iteratively cast through each pixel on the image plane described by the synthetic camera model. The synthetic camera model contains distortion parameters including principal point offset (cx,cy); lens distortion coefficients (not shown); and shear distortion (not shown). This paper assumes an ideal camera model where all camera equation distortion values are ignored. The resulting camera matrix is, [1] K=[1ps0cx01pscy001f][1] where, f is the focal length, cx and cy are the principal point offsets as shown at , and ps is the pixel size for the synthetic camera. These parameters are chosen based on the desired camera model. This research discusses parameter choices in the coarse registration subsection of the method section and principal distance subsection of the results and discussion section.

Figure 6. Projection of a ray through the synthetic camera model into mapping space.

Figure 6. Projection of a ray through the synthetic camera model into mapping space.

The camera orientation in the point-cloud frame is the rotation matrix RsciPC. The rotation matrix is then transformed to get the orientation from the SCI frame to the point-cloud frame (RsciM) (Ellum and El-Sheimy Citation2002). [2] R=R3(ω)R2(ϕ)R1(κ)=RsciM[2]

The direction (d) of the ray passing through a pixel is calculated as, [3] d=KRsciM [ji1][3] where j is the column location of the pixel and i is the row location of the pixel. The rotation matrix is applied to the mapping frame and is described in the SCI orientation parameters subsection of the results and discussion section.

Finally, the position of the synthetic camera must be generated in the MMS point-cloud frame. This vector is represented by Cj seen at . A ray is cast from Cj in the direction of each pixel in the synthetic camera array.

Intensity interpolation

To find the spectral or intensity value of the surface where the pixel strikes, it is necessary to interpolate from the nearest MMS point-cloud points. The natural neighbor interpolation method is used to preserve edges and minimize aliasing and artifacts in the final SCI. Natural neighbors interpolation has seven steps shown at (Fisher Citation2006, 97–108; Sibson Citation1981).

Figure 7. Vornoi cells for natural neighbors (adapted from Sibson (Citation1981)).

Figure 7. Vornoi cells for natural neighbors (adapted from Sibson (Citation1981)).

The new point (P) is inserted, represented by the blue point in . Voronoi cells are drawn around the blue point and its neighbors. The white and blue areas represent these cells. The volume is calculated for these cells. P  is removed and the Voronoi cells are drawn with black lines around the neighbors only. The volumes are recalculated and the difference provides the weights, represented by green circles and numbers, for the intensity values. The natural neighbor shows significantly less aliasing and is used in feature matching between the synthetic and camera images. The camera image must be processed to minimize the spectral differences between the point-cloud and the camera image.

Camera image processing

As identified earlier, the spectral information differs between synthetic camera images and RGB cameras. To minimize this difference for matching, it is necessary to process the camera RGB image by transforming the color space. It is found that the hue, saturation and intensity (HSI) model minimized the spectral differences between the two image types (Gonzalez and Woods Citation2008).

HSI models decouple intensity information from color information (hue and saturation). Hue is used to describe the pure color’s attribute. This describes if the color is red, green, blue, yellow, etc. Saturation describes the amount of white light, and intensity describes the grey level. A reminder that this intensity is different than the laser scanner intensity, which describes the energy return of the laser.

It is necessary to downsample the image to provide similar spatial resolution to the synthetic image and overcome the spatial resolution problem. This is done by using the proportion of ground sampling distance of the camera and MMS point spacing as seen in where, [4] Downsample ratio=GSDcameraPointSpacingMMS=ps *Distancefsin(αscan)*Distance=psf* sin(αscan)[4]

Figure 8. MMS point spacing vs camera ground sampling distance.

Figure 8. MMS point spacing vs camera ground sampling distance.

MMS point distribution is influenced by many variables including distance from scanner, scanning geometry like scanning angle, and velocity of the MMS (Puttonen et al. Citation2013). is a schematic representation of point distribution for simplicity.

Coarse registration features

Traditionally, keypoint feature-based registration is used for camera-to-camera. Scale-invariant feature transform (SIFT) (Lowe Citation2004) and speed-up robust features (SURF) (Bay et al. Citation2008), are two keypoint features used to successfully identify matching points for place-recognition between camera images (Gupta and Cecil Citation2014). These keypoint methods are dependent on intensity values to define the feature for matching.

Alternatively, image to point-cloud registration uses linear features or corner features like hessian features used by Forkuo and King (Citation2004). These features are robust to spectral and spatial differences. However, these methods are not viewpoint invariant. In contrast, CNNs are robust to spectral and spatial differences and viewpoint invariant but are not generalizable and are outperformed by traditional image processing and feature-based registration (Ku et al. Citation2018). Sunderhauf et al. (Citation2015) suggests landmark features for camera-to-camera coarse registration that are pose and intensity invariant.

Landmark features are a combination of traditional image processing with feature-based registration and CNNs. Objects are extracted from an image by drawing edge boxes around long contiguous edges. These are edgeboxes, which are then converted into a descriptor by running them through layers of a generic pretrained CNN. This descriptor is then compared using the cosine distance to find matches (Sunderhauf et al. Citation2015). This research proposes to adapt this method for camera image to point-cloud coarse registration.

Feature detection: edgeboxes

Edgeboxes use edges in the image to detect objects in the scene (Dollar and Zitnick Citation2013). A sliding window is moved around the image to find object proposals based on contours. Every proposal is given a score to identify if there is an object. A box is drawn around the object to capture the whole line and is stored as a feature.

CNNs are limited by 5 factors: (i) overfitting (McGrail and Rhodes Citation2020); (ii) data insufficiency (Xu et al. Citation2021); (iii) lack of interpretability (Mi et al. Citation2020); (iv) computational complexity (Kearns Citation1990); (v) continuous training (Parisi et al. Citation2019), and (vi) catastrophic forgetting (Parisi et al. Citation2019). These limitations can be avoided by extracting output before the decision layers and avoiding the necessity of training the network. Avoiding the limitations are why traditional features are used for this research. Therefore, edgeboxes were chosen over region-based CNNs such as Fast-RCNN and Mask-RCNN. Edgeboxes’ parameterization allows for the fine-tuning for the processed camera image and the SCI to capture the same features without the need for training on both datasets. The possibility of catastrophic forgetting, computational complexity, and over fitting are mitigated by edgeboxes.

Edgeboxes identify objects from the candidate bounding boxes is measured using intersection over union (IoU) where area of the intersection between candidate and ground truth boxes is divided by their union area (Zitnick and Dollár Citation2014). Some applications, like the camera-to-camera place recognition, work with a lower IoU score because they use similar images of similar scenes. However, for this application an IoU score greater than 0.85 is necessary because of the need for complete and distinct object features from both SCI and camera images. Lower scores mean fewer object features are being found in both data sets.

There are four important criteria that impact the IoU (Zitnick and Dollár Citation2014): (i) window step size; (ii) non-maximal suppression; (iii) minimum box score; and (iv) number of boxes. Each of these criteria is explained below. Observed values are provided in the results section.

The window step size defines how far the sliding window is moved in the image. Larger step sizes detect fewer overlapping objects meaning fewer detected objects and less feature detail. This leads to the need for different step sizes for SCIs and camera images. Smaller step sizes are used in the SCI to capture more detail and the edgebox captures more objects. Since, camera images have more detail it requires a larger step size to synchronize with the synthetic image that require smaller step sizes. This limited the number of smaller objects like mailboxes and outdoor lights that are usually detected by a smaller step size in the camera image.

The non-maximal suppression (NMS) is used to remove boxes if there is high overlap between two boxes. It accepts the box with the higher score. The threshold is used to determine the amount of overlap that is acceptable.

The minimum box score removes any boxes where the score is lower than the set threshold. The score is calculated from contours captured by the candidate box. Long edges connected with shallow curves are scored higher than edges connected with sharp curves.

The number of boxes is the maximum number of objects detected in the image. If edgeboxes detect fewer objects than the maximum number of boxes, it stores the smaller number of boxes. For example, if observing a shoe and the maximum number of boxes is 10 but we only count 3 – tongue, laces, and sole – the edgebox stores 3 for the observed objects.

Once the features are detected and an edgebox is drawn around the object, the box is cropped from the image. These cropped features are then passed through a CNN to extract the feature descriptor.

Feature description: neural network layers

CNNs are multi-layered neural networks specializing in recognizing visual patterns in images and are regularly used in camera-to-camera place recognition (Chen et al. Citation2014). CNNs rely on teaching datasets to make decisions. These decisions have been shown to be less generalizable than traditional feature matching methods and require colorized point-clouds (Ku et al. Citation2018). The five limitations of CNNs described above are avoided by stopping before the dense decision layers while using the convolutional and pooling layers to generate a robust distinctive landmark feature descriptor (Neubert et al. Citation2013). It is for these reasons that the research adopts an approach that ignores the dense decision layers for feature description.

The landmark feature method employs AlexNet, a linear generic pretrained CNN, for generating invariant feature descriptors from edgeboxes (Krizhevsky et al. Citation2012). The edgeboxes are passed through the neural layers and does not require specific environment training. AlexNet consists of four convolutional layers, two pooling layers, and linear feature-filtering neurons, providing intensity and pose-invariant solutions as seen at . This network efficiently processes images by convoluting, downsampling, and filtering for unique features (Krizhevsky et al. Citation2012). The output from the third convolutional layer is flattened and used as the feature descriptor in camera-to-camera imagery because Neubert et al. (Citation2013) found it was robust to appearance changes.

Figure 9. AlexNet Convolutional Neural Network design diagram for creating landmark feature descriptor (Patil et al. Citation2023).

Figure 9. AlexNet Convolutional Neural Network design diagram for creating landmark feature descriptor (Patil et al. Citation2023).

shows the output of the first convolutional layer. Linear features are shown in the top line and color/intensity filter is highlighted in the third row. This paper tests these layers to find the most distinct feature for registration between camera and synthetic image.

Figure 10. Example of AlexNet’s first convolutional layer output.

Figure 10. Example of AlexNet’s first convolutional layer output.

Feature matching: cosine distance

The output from the chosen layer is then flattened into a one-dimensional array. This flattened descriptor is then passed into the decision layers for matching. However, for generalizability the flattened descriptor is matched using the cosine distance (Sunderhauf et al. Citation2015). The cosine distance between two flattened descriptor vectors (r and s) is: [5] cos(θ)=rsrs[5] where better matches result when the cosine distance is closest to one.

The synthetic feature descriptors are stored in a database with each image name, pose of the synthetic camera, and the feature descriptors for that image. The camera image feature descriptors are queried against this synthetic image database. The pose of the matched SCI is used as an approximate position of the camera image. The largest features are matched first. The best matches are refined by matching smaller features.

Methods

This section describes the implementation of the methods described in the literature review. Development of a new coarse registration method requires a new approach to SCI generation to resolve spatial and spectral resolution differences between the MMS point-cloud and subsequently captured camera images.

In the new method, synthetic images are generated from the MMS point-cloud following the steps shown in the left column of the flowchart at . A surface is generated from the MMS point-cloud using the fast recon surface generation method. The thresholds are tested and described in the results and discussion section. The synthetic image is generated using raytracing with orientation parameters perpendicular to MMS trajectory and captured at 10 meter intervals. Rays are traced through the camera model and the pixel value is interpolated with the natural neighbor method to minimize aliasing when it strikes a surface. A median filter is applied to the SCI to remove any salt-and-pepper noise while preserving edges. A database of the landmark feature descriptors is then generated from the series of SCIs. The testing of these parameters is described in the following subsections.

Figure 11. Flowchart for coarse registering subsequent camera images to MMS legacy model.

Figure 11. Flowchart for coarse registering subsequent camera images to MMS legacy model.

Camera image features are created by converting the image into the HSI color model where saturation and intensity are used to represent the SCI intensity and downsampling is done to match the spatial resolution of the SCIs. Features are detected in the processed image and passed through the CNN for description. These features are then queried against the SCI feature database using size similarity and the cosine distance to find the best match. Precision is used for the analysis of the registration as it represents the true positive registration over the total number of registrations. Another criterion, recall, represents the retrieved incidences over the relevant instances. This case does not have false negatives only false positives and true positives. Therefore, recall is not descriptive of the results and is not used in the analysis.

The precision of these matches is compared with the correct manually selected matches. A correct match is counted when the coarse registration and manually selected matches are the same. The total number of matches is used to determine the precision by, [6] precision=Correct matchescorrect matches+incorrect matches[6]

shows the novel approach for coarse registration of a subsequent camera image to an MMS point-cloud using SCIs. The spatial resolution problem is solved using SCIs by surface generation, raytracing, natural neighbor intensity interpolation, and median filtering. Surface generation fills in the spaces between point-cloud points. Raytracing projects rays from the synthetic camera to find pixels that strike the surface. Natural neighbor intensity interpolation identifies the SCI pixel intensity value, and the median filter preserves edges and reduces noise in the final SCI. Research into coarse registration of camera images to non-colorized point-clouds.

Synthetic camera image

The surface generation, raytracing, and intensity interpolation and the camera image processing were tested in an indoor controlled experiment. A camera was setup at the same location as a terrestrial laser scanner. The experiment emulated Forkuo and King (Citation2004). However, fine registration was done using SIFT and SURF instead of Harris corners. The three surface generation methods were tested and compared in the indoor environment. They were examined for large holes and artifacts created from the surface generation methods. The raytracing and intensity interpolation were tested for their ability to generate the best representation of the laser scanner intensity image while minimizing aliasing.

The camera image processing was tested by matching SIFT and SURF features between the processed camera image and the SCI. The camera image was down sampled at the calculated 50% as well as 75%, 33% and 25% for comparison in the indoor experiment. Different color model representations were also tested. Grayscale and combinations of HSI and CMY color models were examined for matching. The coarse registration method is tested on an MMS data capture in an outdoor environment.

SIFT features were used to test the novel SCI method. SIFT works well on a dense field captured from a terrestrial scanner and provide precise matches between an indoor SCI and a captured image. However, SIFT descriptors are not unique when observing the outdoor scenes. Therefore, it was necessary to find a feature and descriptor that could match outdoor scenes to outdoor scenes. Landmark features are used because they mix global and linear features resulting in a feature robust to scale and intensity variations (Sunderhauf et al. Citation2015). Landmark features are modified to coarsely register camera images to the SCI using edgeboxes, neural network layers, and traditional matching techniques.

Coarse registration

The coarse registration method is tested on a large MMS dataset following the proposed light rail transit (LRT) line in Calgary, Alberta, Canada. The MMS dataset was captured in 2016 to expand Calgary’s LRT in the city’s northeast. Synthetic camera images are generated in two areas of the LRT route to capture a variety of building types and densities. It captures high-density commercial and residential, medium-density commercial, low-density commercial and residential, and parkland. The dataset is depicted in . It covers approximately 10 km but was reduced to 14 city blocks: 8th Avenue to 20th Avenue, and Beddington Boulevard to Bergen Crescent as these contained a sufficient variety of land uses and streetscapes.

Figure 12. Map of test area.

Figure 12. Map of test area.

The synthetic image generation methods are tested in a non-controlled environment where outdoor scenes are captured perpendicular to the MMS trajectory at different intervals. This includes 4 description and matching methods: (i) surface generation, (ii) raytracing, (iii) intensity interpolation, and (iv) landmark features.

Nine influencing variables were tested in the outdoor experiments: (i) number of camera edgeboxes; (ii) number of synthetic image edgeboxes; (iii) height of synthetic image; (iv) vertical synthetic camera angle; (v) synthetic image frequency; (vi) camera spatial resolution; (vii) principal distance; (viii) camera angle; and (ix) camera distance. These are broken down into two categories: camera image experiments and synthetic image experiments.

summarizes the experimental variables that are used to test registration between camera images and SCIs. Each experiment was performed while keeping the other variables constant.

Table 1. List of test parameters.

schematically shows the synthetic image experiments. Height of the synthetic image is the vertical distance of the synthetic camera from the model surface. Vertical angle is calculated and tested the vertical pose angle of the synthetic image to minimize road capture and maximize scene information. Synthetic image frequency tests how frequently the synthetic images are captured. This is tested at 20 meters, which captures each house and lot in the model and 10 meters, which captures additional images of each house and lot. The number of SCI edgeboxes examines the step size, NMS, and maximum number of boxes.

Figure 13. Synthetic image parameters.

Figure 13. Synthetic image parameters.

The camera images were captured along the MMS trajectory as shown at . These images show a variety of different scenes that include, high and low-density commercial and residential, parkland, and empty lots. They were taken at multiple locations with different cameras to apply the different variables described above.

Figure 14. Camera image capture locations.

Figure 14. Camera image capture locations.

shows variables for the camera image experiments. The camera tested variables associated with the camera image used for matching with the synthetic images. The angle tested camera images taken normal to the building and at 45°. The camera distance captured the same side of the road as the scene (minimum distance), matching the MMS and SCI capture location with the opposite side of the road (maximum distance). The spatial resolution tests the downsampling of the camera image to match the spatial resolution of the synthetic image in the scene. Number of camera edgeboxes is tested in the same way as the SCIs. Principal distance is tested by capturing images using 3 different cameras an Apple iPhoneXR with a focal length of 4 mm, a Cannon automatic zoom Optix with a focal length of 7 mm, and a cannon EOS Rebel T2i with a focal length of 18 mm.

Figure 15. Camera image parameters.

Figure 15. Camera image parameters.

During the outdoor experiments it was found that there was significant noise that interfered with the feature matching. A median filter was applied to reduce the noise.

Median filtering

Surface noise relating to the MMS laser scanner are observed in the SCI outdoor scenes with roads and trees. They require processing to minimize noise while preserving edges for the feature matching. shows patterns that cause issues with the feature descriptor. However, linear features are important for the feature descriptors. Therefore it is necessary to apply a median filter to preserve the edges (Gonzalez and Woods Citation2008).

Figure 16. Road noise and patterns mis-identified as linear.

Figure 16. Road noise and patterns mis-identified as linear.

The median filter provides a reduction of salt and pepper noise on buildings, roads, and other horizontal surfaces while maintaining edges. The filter decreases the amount of noise on the road. However, it does not resolve the noise on the road and decreases the precision of the feature matching. A simple solution is to tilt the synthetic camera up from horizontal to minimize the visible amount of road in the synthetic images. This has the added benefit of capturing more distinct features of taller buildings in higher-density areas.

Results and discussion

Surface generation

Poisson reconstruction works well for indoor rooms, as seen at , and with outdoor single buildings or scenes but has difficulty with outdoor scenes when generating surfaces for large blocks. It leaves holes and artifacts in the MMS synthetic image like those seen at . The Delaunay method had similar results in the indoor tests and in the outdoor experiment also experienced holes, artifacts. Delaunay also resulted in slivers where the point-cloud was sparse.

Figure 17. Artifacts and holes from the Poisson surface reconstruction.

Figure 17. Artifacts and holes from the Poisson surface reconstruction.

The indoor and outdoor experiments revealed that fast recon produced surfaces with the fewest holes, artifacts, and slivers. Fast recon also provided consistence results using the same thresholds compared to Poisson that required threshold tuning for each MMS point-cloud segment. Therefore, the fast recon method is adopted for the use in synthetic image generation from MMS point-clouds.

For the outdoor experiment the best maximum angle for multiple blocks is 160°. Unfortunately, this causes trees to appear as large conflagrations of triangles but provides the fewest holes for building surfaces. For multiple blocks the minimum angle of 5° removes artifacts and slivers but kept narrow feature details for objects like light standards and signposts.

The synthetic images are generated by raytracing through a synthetic camera model and interpolating intensity values where the rays intersect a generated surface.

Intensity interpolation

The intensity interpolation nearest neighbor provided aliasing that prevented feature matching. The aliasing is most prevalent at where no interpolation method was used. show the nearest neighbor interpolation methods that result in square and diamond shaped aliasing. The best method was found to be the natural neighbor interpolation, at , method for limiting the amount of aliasing.

Figure 18. Indoor spectral interpolation: (a) pixels without interpolation; (b) aliasing of nearest neighbor; (c) aliasing using weighted linear interpolation using 5 nearest neighbors; (d) natural neighbor interpolation.

Figure 18. Indoor spectral interpolation: (a) pixels without interpolation; (b) aliasing of nearest neighbor; (c) aliasing using weighted linear interpolation using 5 nearest neighbors; (d) natural neighbor interpolation.

Camera image processing

Grayscale, HSI, and CMY color models were explored for feature-based registration to SCI images to validate Forkuo and King’s (Citation2004) method. The CMY model provided no precise feature matches. The grayscale provided some matches, but the HSI color model provided the most precise feature matches as seen at . The combination of saturation and intensity provided the best matches using SIFT on indoor tests and shows that the spectral resolution challenge can be addressed using this combination. Saturation squared added to the intensity value of the image provides the most precise matching results from using saturation squared times the intensity value where each camera pixel (pi) is calculated using EquationEquation 7. [7] pi=Si2+Ii2[7]

Table 2. Indoor SIFT and SURF feature matching results.

Unfortunately, this research finds that in several initial MMS SCI tests, both SIFT and SURF failed to register camera images and MMS SCIs because of poor feature descriptor matching caused by the spatial and spectral differences. Therefore, a feature descriptor robust to spectral and spatial differences is needed for feature-based registration. However, the parameters of the landmark features need to be tuned to address the differences between the camera and MMS synthetic camera images.

Edgebox parameters

It was necessary to tune the parameters of the edgebox detection for the camera and synthetic datasets because there some spectral and spatial differences persisted. The camera images require larger step sizes, but lower NMS to capture features like the SCI. Tuning preserves feature similarity between the camera and synthetic images.

For the SCIs the window step size is tested. A smaller step size 0.65 for the synthetic images provides a larger number of unique features. This differs from the processed camera images because the 0.65 step size captures too many smaller details like a mailbox or porch light, seen at , that are not a visible feature in the SCI. A step size increase to 0.75 resolves the issue by increasing the precise detection of objects but reducing efficiency. This confirms the assertions of Zitnick and Dollár (Citation2014).

Figure 19. Correct match with extracted features and cosine distance of best matches.

Figure 19. Correct match with extracted features and cosine distance of best matches.

SCI edgebox NMS is set at 0.85 to remove excessive trees and repetitive boxes, shown as purple features at . It is reduced to 0.75 for the camera image seeking higher precision for the more detailed image. The minimum edge score is set to 0.1 for both, does not impact the results.

From empirical testing, a maximum number of 33 boxes for the synthetic images limits repetitive boxes and boxes of surrounding trees. The camera images use 50 boxes to capture features for matching with the synthetic images. It is necessary to go through the synthetic images and remove some of the duplicate trees because the triangles in the trees generated high box scores, but not distinct features for matching. The smallest features, like windows, also provide mismatching and need to be removed. Manually removing some small features for better matching, results in a final feature count of 25-33 boxes for the synthetic images.

These boxes are then sorted from largest to smallest for improved matching because the larger features provide a good initial estimate of location and the smaller features refine the scene match between the camera and synthetic images.

SCI orientation parameters

The SCI internal parameters emulate the iPhone 7 camera that is used to capture outdoor images. It parallels the MMS trajectory because initial tests were compared against an iPhone 7 camera. SCI are captured from the average easting of the MMS from multiple passes and at 20 meter intervals. The height of the synthetic camera is the recorded MMS height. The orientation parameters are perpendicular to the direction of travel in both east and west directions and parallel to the horizontal plane.

It is assumed that matching the camera and SCI parameters minimizes the difference between the datasets and maximizes matching potential. Initial tests support this assumption as the camera images match when taken from the same location as the SCI with the same approximate orientation. However, fine-tuning of the height, orientation, and frequency parameters is done after matching issues arose when testing against the larger dataset. The focal length parameter is also tested by capturing 184 images with three cameras: (i) 128 images captured by the iPhone XR, (ii) 27 images captured by the Cannon EOS Rebel, and (iii) 28 images captured by the Cannon Optix.

Height of synthetic image

Matching the synthetic camera height to that of an average user provides more reliable results. The 2D location of the synthetic camera is found using MMS trajectory and 1.5 meters is added to the nearest point’s height which matches the approximate height of an average user.

The camera images match with the SCIs when captured from a similar location to the synthetic camera. This supports the research question about non-technical users capturing images because it is robust to scale and minor viewpoint changes.

Vertical synthetic camera angle

The initial observation angle was 0° from the horizon which results in capturing a significant portion of the road. This causes some matching issues because the road contains intensity artifacts from the interpolation method causing confusion between the road and buildings. Observing at 0° also makes it difficult to see complete buildings and unique building features in high-density areas because the perspective sees too much ground and not enough of the structure. To resolve this problem, a 10° vertical observation angle is applied to the external orientation parameters. It improves the coarse registration by removing the road in most SCIs and provides better building capture.

Synthetic image frequency

The 20-m spacing between synthetic images successfully matches when the buildings are well defined, fully captured, and have minimal flora. However, the matching struggles when the features are less unique. For example, higher-density buildings only capture a few storefronts and not the identifying building features. This leads to more geometric burstiness or repetitive features which result in registration challenges because the features are not unique. Geometric burstiness was defined by Sattler et al. (Citation2016) and is used in here for convenience. Increasing the capture frequency to 10 meters resolves some burstiness.

Generating SCIs every 10 m means that most scenes are captured multiple times. For example, residential houses have three SCIs capturing the view, one with a normal incidence angle to the building and one SCI on either side of the building. Additionally, it minimizes the distance between the SCI and camera locations. Most of a building’s identifying features are also captured in the 3 SCIs. This allows weighting the smaller matches between neighboring SCIs.

The largest feature is queried against the database first to find the top 15 matches. The largest feature is used because it emulates a global feature. Smaller features are then queried to refine the best matches. A weight is given to the smaller features that are also contained in adjacent SCIs. Weighting allows the capture of unique features but removes the mis-registration caused by a poorly placed camera image or an SCI that does not capture the complete building.

Camera parameters

In this next section the results are shown from the camera parameter tests. The spatial resolution is tested at 10%, 30%, 33%, and 50% against the matching of the feature descriptors. The principal distances impact is tested for the three cameras at 4, 7, and 18 millimeters on the registration results. Finally, the camera location is tested at three distances same side of the street, same position as the synthetic camera, and opposite side of the street. Here the angle is also tested as normal to the building and at an oblique angle.

Camera spatial resolution

The camera spatial resolution was initially calculated to be approximately 10%. However, this downsampling lost too much data and resulted in few detectable features in the resulting processed camera image. Reducing the downsampling from 10% to 33% resolved the issue.

MMS usually use multiple passes to capture all data and this impacts the line and point-spacing. In turn, point-spacing impacts the downsampling ratio calculated at Equationequation [4]. By reducing the denominator by 3 the resulting ratio is increased by a multiple of 3. This results in an estimated ratio of 30%. The best matching distances occurred when using a ratio of 33%. The interpolation of the intensity information for the synthetic camera and the 3 passes lead are contributing factors to a 33% ratio.

Principal distance

Camera principal distance does not provide any difference between cameras. shows the precision of the differences between cameras. The Canon EOS and Optix cameras captured a smaller dataset. The residential images captured were images that are successfully registered by the Apple iPhone Xr.

Table 3. Focal length test results.

The overall precision for the dataset of the 2 additional cameras were smaller at 67% for the Canon EOS and 68% for the Optix than the Apple iPhone Xr at 74%. The Canon cameras residential scenes are 100% matches because they are captured on images that are first matched using the iPhone Xr. The light commercial and high-density precision appears smaller but succeed and fail at the same locations as the iPhone Xr. The result is that although the precision appears lower, the images are registered, or fail to register, at the same locations as the iPhone Xr. This suggests the focal length parameter in the coarse registration is insignificant.

Camera angle and distance, and change detection

The iPhone Xr camera images are rigorously tested to identify where the method works. The normal incidence angle images work best when there is a clearly defined building with unique features. It has difficulty registering buildings with large changes or with extensive geometric burstiness. The 45° angular image tests fail to register due to, (i) large variations in the detected features, (ii) additional objects in the image, and (iii) inclusion of objects that are not unique.

The residential buildings and high-density areas provide the best matches for normal incidence angle images due to higher uniqueness between buildings with more identifiable features. The light commercial areas contain fewer unique features. The images captured for the light commercial areas also contain greater changes between the SCI and the camera image. The light commercial areas included large parking lots where the buildings are not visible in the MMS map. For example, one of camera images contains a truck that takes up 20% of the image at (left). This results in confusion for the algorithm, which misidentifies the features as containing truck features and then match with a car dealership.

Figure 20. Challenging photographs; (left) trucks dominates the scene; (right) black building is new and does not match the existing model.

Figure 20. Challenging photographs; (left) trucks dominates the scene; (right) black building is new and does not match the existing model.

Since the research looks for changes, it is important to understand how changes impact the results. One of the later image series captures a building constructed after the MMS data capture. The image composition is 75% new buildings (left) and 25% old buildings (right). These images register to the incorrect location because of the paucity of features from the old building. The registration fails when 20% or more of the camera image is different than the SCI. This suggests that the user needs to capture at least 80% of an existing scene in the camera image to correctly register the scene.

The histogram at shows that changes greater than 20% lead to false positive matches. It shows a drop to zero when the camera and synthetic scene similarity is less than 80%. There was one match at 79.9% that may underestimate the similarity between the two images because the edgebox captured more than the object. The large number of matching at 95% reflects control of the scene change variable.

Figure 21. Histogram of percent similarity between images.

Figure 21. Histogram of percent similarity between images.

The best registration occurs when images are taken orthogonal to the building front and when the camera images are taken from similar locations to the SCI. This includes images captured from the opposite side of the street. However, as seen at , same-side street images are less likely to register. This is because the larger features provide the most uniqueness for matching between images and synthetic images. The same-side provides smaller, more detailed features that do not align with the SCIs. shows that there is poor registration when the incidence angle is 45°. Therefore, the images need to be captured closer to a normal incidence angle.

Table 4. Camera standoff distance test results.

Table 5. Incidence angles test results.

The orthogonal photos fail to register when the foliage is excessive. The foliage causes geometric burstiness in the SCIs and results in mismatching at (right). This is also true in the parkland areas at (left). The trees do not provide enough uniqueness and the detected background objects are not captured by the MMS due to its range constraints. The result is a lack of parkland registration.

Figure 22. Difficulty with excessive foliage in a scene.

Figure 22. Difficulty with excessive foliage in a scene.

The angle matching is poor because of limited SCI observations parallel to the MMS direction of travel. Since, there are fewer observations on these objects, the surfaces are sparse and full of artifacts and holes. This greatly reduces matching. However, some of the matches occur when close to the building and features detected in the front of the building matched with the objects.

Conclusion

Synthetic images bridge the gap between point-clouds and camera images. SCI’s work for coarse registration in MMS point-clouds by matching spatial and spectral resolutions between the two data sets.

SCI coarse registration precision is maximized by generating surfaces, interpolating intensity values for each pixel, and applying a median filter to reduce noise. SCIs are best captured from the same location as the MMS and should be generated at a frequency of every 10 m to capture scenes more than once. The SCI should be angled up by 10° to reduce road capture and increase building capture to maximize the number of unique features.

The coarse registration is most precise when the camera image captures the whole scene and is either captured at the same distance as the SCI or further from the scene. Capturing from the opposing sidewalk provided equivalent precision to the same distance camera captures. The camera image is downsampled to match the MMS point spacing then transformed in to the HSI color model and processed to match intensity values with the model.

Camera images captured at a normal incidence angle to the scene provided precise registration. However, when captured at an oblique angle of 45° the camera captured features outside the SCI scene resulting in poor registration. Camera principal distance does not impact coarse registration. Landmarks precisely detect scenes in the MMS when the changes to the scene are less than 20%, and the foliage does not exceed 20% of the camera image. The landmark features work best where scenes are unique. More repetitive features in a scene result in less precise registration. Better surface generation reduces geometric burstiness and results in more precise registrations. Additional work may provide better surface generation for flora to reduce the impact of geometric burstiness.

Determining the orientation of an arbitrary photograph using the point cloud is the first step in updating the point-cloud with subsequent imagery. The camera image itself does not fill in the point-cloud or update the 3D Model. However, multiple images oriented using the described coarse registration method can produce a dense point-cloud. This dense point-cloud can be finely registered to the existing model to fill in any occlusion gaps or model updates.

Conflict of interest statement

The dataset and partial funding was provided by McElhanney Ltd. They are an industry sponsor who sought to reduce costs for registration and updating of MMS point-clouds.

Additional information

Funding

This work was supported by the Alberta Innovates Graduate Student Scholarship.

References

  • Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. 2008. “Speeded-Up Robust Features (SURF).” Computer Vision and Image Understanding, Vol. 110(No. 3): pp. 346–359. doi:10.1016/j.cviu.2007.09.014.
  • Chen, Z., Lam, O., Jacobson, A., and Milford, M. 2014. “Convolutional neural network-based place recognition.” arXiv Preprint arXiv:1411.1509.
  • Delaunay, B. 1934. “Sur la sphère vide. A la mémoire de Georges Voronoï.” Bulletin de l’Académie des Sciences de l’URSS. Classe des sciences mathématiques et na, Vol. 6: pp. 793–800.
  • Dollar, P., and Zitnick, C. L. 2013. “Structured forests for fast edge detection.” Paper presented at 2013 IEEE International Conference on Computer Vision, 1841–1848. Sydney, Australia: IEEE. doi:10.1109/ICCV.2013.231.
  • Ellum, C., and El-Sheimy, N. 2002. “Land-based mobile mapping systems.” Photogrammetric Engineering and Remote Sensing, Vol. 68(No. 1): p. 5.
  • Fisher, P. F. 2006. Developments in spatial data handling. Paper presented at 11th International Symposium on Spatial Data Handling, Berlin, Heidelberg: Springer Science & Business Media. doi:10.1007/b138045.
  • Forkuo, E. K., and King, B. 2004. “Automatic fusion of photogrammetric imagery and laser scanner point clouds.” Paper presented at International Society for Photogrammetry and Remote Sensing, XXXV Part B4, 921–926. Istanbul, Turkey.
  • Formella, A., and Gill, C. 1995. “Ray tracing: A quantitative analysis and a new practical algorithm.” The Visual Computer, Vol. 11(No. 9): pp. 465–476. doi:10.1007/BF02439643.
  • Gonzalez, R. C., and Woods, R. E. 2008. Digital Image Processing. 3rd ed. Upper Saddle River, NJ: Pearson Education, Inc.
  • Gupta, V. K., and Cecil, K. 2014. “An analytical study of SIFT and SURF in image registration.” International Journal of Engineering and Innovative Technology, Vol. 3(No. 9): pp. 130–134.
  • Harris, C., and Stephens, M. 1988. “A combined corner and edge detector.” In Alvey vision conference, Vol. 15(No. 50): pp. 10–5244.
  • Hilton, A., Stoddart, A. J., Illingworth, J., and Windeatt, T. 1996. “Reconstruction of 3D Delaunay surface models of complex objects.” In 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No. 96CH35929), Vol. 4: pp. 2445–2450.
  • Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. “Poisson Surface Reconstruction.” Paper presented at Proceedings of the Fourth Eurographics Symposium on Geometry Processing, Cagliari, Sardinia, Italy.
  • Kearns, M. J. 1990. Computational Complexity of Machine Learning. Cambridge, Massachusetts: The MIT Press.
  • Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. “ImageNet classification with deep convolutional neural networks.” Advances in neural information processing systems, Vol. 25.
  • Ku, J., Harakeh, A., and Waslander, S. L. 2018. “In Defense of Classical Image Processing: Fast Depth Completion on the CPU.” Paper presented at 15th Conference on Computer and Robot Vision (CRV), Toronto, Ontario, Canada.
  • Lemmens, M. 2018. “Point clouds and smart cities.” GIM International, pp. 24–25.
  • Lowe, D.G. 2004. “Distinctive image features from scale-invariant keypoints.” International Journal of Computer Vision, Vol. 60(No. 2): pp. 91–110. doi:10.1023/B:VISI.0000029664.99615.94.
  • Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., and Milford, M.J. 2016. “Visual place recognition: A survey.” IEEE Transactions on Robotics, Vol. 32(No. 1): pp. 1–19. doi:10.1109/TRO.2015.2496823.
  • Marton, Z. C., Rusu, R. B., and Beetz, M. 2009. “On fast surface reconstruction methods for large and noisy point clouds.” Paper presented at 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 3218–3223. doi:10.1109/ROBOT.2009.5152628.
  • McGrail, T., and Rhodes, T. 2020. “Successful application of AI techniques: A hybrid approach.” Transformers Magazine.
  • Mi, J.-X., Li, A.-D., and Zhou, L.-F. 2020. “Review study of interpretation methods for future interpretable machine learning.” IEEE Access, Vol. 8: pp. 191969–191985. doi:10.1109/ACCESS.2020.3032756.
  • Neubert, P., Sunderhauf, N., and Protzel, P. 2013. “Appearance change prediction for long-term navigation across seasons.” Paper presented at 2013 European Conference on Mobile Robotics (ECMR), Barcelona, Spain, 198–203.
  • Oliver, N., Potočnik, K., and Calvard, T. 2018. “To make self-driving cars safe, we also need better roads and infrastructure.” Harvard Business Review, August 14, 2018.
  • Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and Wermter, S. 2019. “Continual lifelong learning with neural networks: A review.” Neural Networks, Vol. 113: pp. 54–71. doi:10.1016/j.neunet.2019.01.012.
  • Patil, S. S., Pardeshi, S. S., and Patange, A. D. 2023. “Health monitoring of milling tool inserts using CNN architectures trained by vibration spectrograms.” Computer Modeling in Engineering & Sciences, Vol. 136(No. 1): pp. 177–199. doi:10.32604/cmes.2023.025516.
  • Puttonen, E., Lehtomäki, M., Kaartinen, H., Zhu, L., Kukko, A., and Jaakkola, A. 2013. “Improved sampling for terrestrial and mobile laser scanner point cloud data.” Remote Sensing, Vol. 5(No. 4): pp. 1754–1773. doi:10.3390/rs5041754.
  • Rumpler, M., Wendel, A., and Bischof, H. 2013. “Probabilistic range image integration for DSM and true-orthophoto generation.” Paper presented at Image Analysis: 18th Scandinavian Conference, SCIA 2013, Espoo, Finland, Jun 17–20, 2013, 533–544. Berlin, Heidelberg: Springer.
  • Sattler, T., Havlena, M., Schindler, K., and Pollefeys, M. 2016. “Large-Scale Location Recognition and the Geometric Burstiness Problems.” Paper presented at Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, British Columbia, Canada, 1582–1590.
  • Shahrour, I. 2018. “Use of GIS in Smart City Projects.” GIM International, September, pp. 21–23.
  • Sibson, R. 1981. “Chapter 2: A brief description of natural neighbor interpolation.” Paper presented at Interpolating Multivariate Data, edited by V. Barnett, 21–36. New York: John Wiley & Sons.
  • Stich, M., Friedrich, H., and Dietrich, A. 2009. “Spatial splits in bounding volume hierarchies.” Paper presented at Proceedings of the Conference on High Performance Graphics 2009, Vancouver, British Columbia, Canada, 7–13. doi:10.1145/1572769.1572771.
  • Sunderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., and Milford, M. 2015. “Place recognition with Convnet landmarks: Viewpoint-robust, condition-robust, training-free.” Paper presented at Proceedings of Robotics: Science and Systems XI, Rome, Italy, pp. 1–10.
  • Terdiman, P. 2003. “OPCODE: Optimized Collision Detection.” V. 1.3 GITHUB. MATLAB.
  • Wendt, A., and Heipke, C. 2006. “Simultaneous orientation of brightness, range and intensity images.” ISPRS Archives, Vol. 36(No. Part 5): pp. 315–322.
  • Xu, S., Wang, L., Wang, Y., and Zhu, Q. 2021. “Weak adaptation learning: Addressing cross-domain data insufficiency with weak annotator.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8917–8926.
  • Zitnick, C. L., and Dollár, P. 2014. “Edge boxes: Locating object proposals from edges.” Paper presented at Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, pp. 391–405. Springer International Publishing. doi:10.1007/978-3-319-10602-1_26.