Full article: Texture recognition under scale and illumination variations

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Visual scene recognition is predominantly based on visual textures representing an object's material properties. However, the single material texture varies in scale and illumination angles due to mapping an object's shape. We present a comparative study of the colour histogram, Gabor, opponent Gabor, Local Binary Pattern (LBP), and wide-sense Markovian textural features concerning their sensitivity to simultaneous scale and illumination variations. Due to their application dominance, these textural features are selected from more than 50 published textural features. Markovian features are information preserving, and we demonstrate their superior performance for scale and illumination variable observation conditions over the standard alternative textural features. We bound the scale variation by double size, and illumination variation includes illumination spectra, acquisition devices, and 35 illumination directions spanned above a sample hemisphere. Recognition accuracy is tested on textile patterns from the University of East Anglia and wood veneers from UTIA BTF databases.

KEYWORDS:

1. Introduction

A human observer recognizes a visual scene using shape and material attributes. Unfortunately, the surface material's appearance vastly changes under variable observation conditions, negatively affecting its automatic and reliable recognition in numerous artificial intelligence applications. As a consequence, most material recognition attempts apply unnaturally restricted observation conditions, which is shown in Varma and Zisserman (Citation2009), Bell et al. (Citation2015), Gibert et al. (Citation2015).

Scale Invariant Feature Transform (SIFT) features modelled using the Johnson distribution, which allows features to be invariant in rotation, scale, and illumination, are introduced in Hlaing and Zaw (Citation2018). Authors Roy et al. (Citation2018) proposed fractal dimension calculated in the Gaussian scale-space texture representation. Fractal images are combined with LBP images using an indexing function to obtain scale-invariant features. Galois field-based features in Shivashankar et al. (Citation2018) were used for rotation and scale invariant texture classification. Rotation, scale, and illumination invariant features by Yang et al. (Citation2018) use LBP and log-polar energy-based descriptors in the dual-tree complex wavelet transform domain. Another rotation, illumination, and scale invariance variant of LBP (IRSLBP) was published by Veerashetty and Patil (Citation2020), where partial scale invariance was achieved using three different neighbourhood radii and a scale-selective and noise-robust extended LBP (SNELBP) was proposed in Luo et al. (Citation2022). An LBP modification with some extent of rotation, illumination, and scale invariance was described by the method (Shu et al., Citation2022). Rotation and scale invariant features based on the LBP and Gabor filter combination were presented in Muzaffar et al. (Citation2023). Although over 50 textural features were compared in Liu et al. (Citation2018), Simon and Uma (Citation2018), we restricted our comparison to the most effective and, thus, dominant textural features.

An ideal model for representing and classifying materials should be capable of capturing fundamental perceptual materials properties. A multi-dimensional visual texture is an appropriate surface reflectance function model paradigm. The seven-dimensional Bidirectional Texture Function (BTF) is the best measurable representation, as shown in Haindl and Filip (Citation2013). BTF can be measured simultaneously, even if it is not a trivial task, using state-of-the-art measurement devices, computers, and the most advanced visual data mathematical models, see Haindl (Citation2023). Features derived from such multi-dimensional data models preserve information because they can synthesize data spaces resembling the original measurement data space. The authors have introduced a family of fast multi-resolution Markov random field-based models, and in Haindl and Vacha (Citation2015), these models are shown to be robust to illumination conditions.

This paper is an extended version of our ICCCI 2022 paper (Vácha & Haindl, Citation2022) with the additional comparison of textural features, detailed analysis of recognition with different scale factors, and reciprocity of training and test conditions. This paper's contribution is a joint test of scale and illumination variations to simulate realistic visual scene recognition conditions and we present a comparative analysis with several of the most common alternative textural features representing four alternatives' most commonly used textural features. For this analysis, we use the unique UTIA BTF visual material measurements introduced by Haindl et al. (Citation2015).

2. Markovian textural features

The texture is factorized into K levels of the Gaussian down-sampled pyramid and subsequently each pyramid level is modelled by a wide-sense Markovian type of model – the Causal Auto-regressive Random field (CAR) model. Let us assume that each multispectral ( colour) texture is composed of C spectral planes (usually C = 3), $Y_{r} = [Y_{r, 1}, \dots, Y_{r, C}]^{T}$ is the multispectral pixel at location r. The multiindex $r = (r_{1}, r_{2})$ is composed of row index $r_{1}$ and column index $r_{2}$ . The spectral planes are mutually decorrelated by the Karhunen–Loéve transformation. The two-dimensional models assume that the j-th spectral plane of the pixel at position r can be modelled as: (1) $Y_{r, j} = γ_{j} Z_{r, j} + ϵ_{r},$ (1) where $Z_{r, j} = [Y_{r - s, j} : \forall s \in I_{r}]^{T}$ is the $η \times 1$ data vector, $ϵ_{r}$ is Gaussian white noise with constant but unknown variance, $γ_{j} = [a_{1, j}, \dots, a_{η, j}]$ is the $1 \times η$ unknown parameter vector. Some selected contextual causal or unilateral neighbour index shift set is denoted $I_{r}$ and $η = cardinality (I_{r})$ , see example in . The texture is analysed in a chosen direction, where multi-index t changes according to the movement on the image lattice I. Given the known CAR process history $Y^{(t - 1), j} = {Y_{t - 1, j}, Y_{t - 2, j}, \dots, Y_{1, j}, Z_{t, j}, Z_{t - 1, j}, \dots, Z_{1, j}},$ ${\hat{γ}}_{j}$ can be estimated using fast, numerically robust recursive statistics (Haindl, Citation2012): (2) $\begin{aligned} V_{t - 1, j} & = (\begin{matrix} \sum_{u = 1}^{t - 1} Y_{u, j} {Y_{u, j}}^{T} & \sum_{u = 1}^{t - 1} Y_{u, j} {Z_{u, j}}^{T} \\ \sum_{u = 1}^{t - 1} Z_{u, j} {Y_{u, j}}^{T} & \sum_{u = 1}^{t - 1} Z_{u, j} {Z_{u, j}}^{T} \end{matrix}) + V_{0} \\ = (\begin{matrix} V_{y, j (t - 1)} & V_{zy, j (t - 1)}^{T} \\ V_{zy, j (t - 1)} & V_{z, j (t - 1)} \end{matrix}), \end{aligned}$ (2) (3) ${\hat{γ}}_{t - 1, j}^{T} = V_{z, j (t - 1)}^{- 1} V_{zy, j (t - 1)},$ (3) (4) $λ_{t - 1, j} = V_{y, j (t - 1)} - V_{zy, j (t - 1)}^{T} V_{z, j (t - 1)}^{- 1} V_{zy, j (t - 1)},$ (4) where the positive definite matrix $V_{0}$ represents prior knowledge.

Figure 1. Unilateral contextual neighbourhood $I_{r}$ of sixth-order used for CAR model. X marks the current pixel, the bullets are pixels in the neighbourhood, the arrow shows movement direction, and the grey area indicates acceptable neighbourhood pixels.

2.1. Colour invariants

Our textural features are $a_{s, j} \forall s \in I_{r}, j = 1, \dots, C$ which are colour invariants and additional colour invariant features derived from this model in Haindl and Vácha (Citation2017). The spectral index j is excluded for simplification in invariants (Equation6(6) $α_{2} = 1 + Z_{t}^{T} V_{z (t)}^{- 1} Z_{t},$ (6) )–(Equation20(20) $β_{12} = \sqrt{| V_{y (t)} | {| λ_{t} |}^{- 1}},$ (20) ). (5) $α_{1} = \sum_{j = 1}^{C} a_{i, j} \forall i,$ (5) (6) $α_{2} = 1 + Z_{t}^{T} V_{z (t)}^{- 1} Z_{t},$ (6) (7) $α_{3} = \sqrt{\sum_{\forall r \in I} {(Y_{r} - {\hat{γ}}_{t} Z_{r})}^{T} λ_{t}^{- 1} (Y_{r} - {\hat{γ}}_{t} Z_{r})},$ (7) (8) $α_{4} = \sqrt{\sum_{\forall r \in I} {(Y_{r} - μ)}^{T} λ_{t}^{- 1} (Y_{r} - μ)}},$ (8) (9) $β_{1} = \ln (\frac{ψ (r)}{ψ (t)} | λ_{t} | {| λ_{r} |}^{- 1}),$ (9) (10) $β_{2} = \ln (\frac{ψ (r)}{ψ (t)} | V_{z (t)} | {| V_{z (r)} |}^{- 1})$ (10) (11) $β_{3} = \ln (| V_{z (t)} | {| λ_{t} |}^{- η}),$ (11) (12) $β_{4} = \ln (| V_{z (t)} | {| V_{y (t)} |}^{- η}),$ (12) (13) $β_{5} = tr {V_{y (t)} λ_{t}^{- 1}},$ (13) (14) $β_{6} = \ln (\sum_{∀r \in I} \frac{1}{| I |} p (Y_{r} | Y^{(r - 1)}) {| V_{y (t)} |}^{\frac{1}{2}}),$ (14) (15) $β_{7} = \ln (\ln p (Y^{(t)} | M) + (ψ (t + 1) + 2) \ln | V_{y (t)} |),$ (15) (16) $β_{8} = {(\frac{ψ (r)}{ψ (t)} | λ_{t} | {| λ_{r} |}^{- 1})}^{\frac{1}{2}},$ (16) (17) $β_{9} = {(\frac{ψ (r)}{ψ (t)} | V_{z (t)} | {| V_{z (r)} |}^{- 1})}^{\frac{1}{2 η}}$ (17) (18) $β_{10} = {(| V_{z (t)} | {| λ_{t} |}^{- η})}^{\frac{1}{2}},$ (18) (19) $β_{11} = {(| V_{z (t)} | {| V_{y (t)} |}^{- η})}^{\frac{1}{2}},$ (19) (20) $β_{12} = \sqrt{| V_{y (t)} | {| λ_{t} |}^{- 1}},$ (20) where μ is the mean value of $Y_{r}$ and $ψ (t)$ is a number of the pixel from the beginning. $p (Y^{(t)} | M)$ is the posterior probability of the model (Equation1(1) $Y_{r, j} = γ_{j} Z_{r, j} + ϵ_{r},$ (1) ), and $p (Y_{r} | Y^{(r - 1)})$ is prediction probability, both defined in Haindl (Citation2012).

We used neighbourhood $I_{r}$ of sixth order (see ), where $η = 14$ , r = 0 corresponding to prior, and t equals to the last pixel in the image. All invariants (Equation6(6) $α_{2} = 1 + Z_{t}^{T} V_{z (t)}^{- 1} Z_{t},$ (6) )–(Equation20(20) $β_{12} = \sqrt{| V_{y (t)} | {| λ_{t} |}^{- 1}},$ (20) ) were computed on all spectral planes and concatenated into the feature vector as shown on the diagram in . The CAR model and colour invariant feature vector were computed on K = 5 Gaussian pyramid levels and in 3 directions, and the features were again concatenated. Finally, the feature vectors were compared with fuzzy contrast $F C_{3}$ introduced by Santini and Jain (Citation1999). Downscaling on the Gaussian pyramid is possible as the image provides sufficient resolution. It may be needed for lower-resolution images to use K = 4 levels of the Gaussian pyramid, which was also tested. When the Karhunen–Loéve transformation preceded CAR features computation, they were denoted by the '-KL' suffix.

Figure 2. The texture analysis algorithm flowchart uses 2D random field models; the K-L transformation step is optional.

3. Frequented alternative features

Hundreds of textural features were published, and testing all these features on the extensive UTIA BTF wood database (426 465 wood images, 260 TB of data) is infeasible. Hence, we compare the CAR features with the following most frequented alternatives, each compared with their author's suggested distance.

3.1. Histogram based features

The most straightforward features used in this study are based on histograms of colours or intensity values. Although, these features cannot be considered proper textural features because they are not able to describe spatial relations, which are the critical texture properties, their advantage is robustness to various geometrical transformations, fast and easy implementation. The cumulative histogram proposed in Stricker and Orengo (Citation1995) is defined as the distribution function of an image histogram. The i-th bin $H_{i}$ is computed as $H_{i} = \sum_{ℓ \leq i} h_{ℓ},$ where $h_{ℓ}$ is the ℓ-th bin of ordinary histogram. The distance between two cumulative histograms is computed in the $L_{1}$ metric. The cumulative histogram is more robust than the ordinary histogram because a small intensity change characterized by a one-bin shift in the ordinary histogram, have only negligible effect on the cumulative histogram.

3.2. Gabor features

The Gabor filters introduced and used by Bovik (Citation1991), Randen and Husøy (Citation1999), Grigorescu et al. (Citation2002), Han and Ma (Citation2007), Li et al. (Citation2009) can be considered as orientation and scale tuneable edge and line (bar) detectors and statistics of Gabor filter responses in a given region are used to characterize the underlying texture information. A two-dimensional Gabor function $g (r) : R^{2} \to C$ can be specified as $g (r) = \frac{1}{2 π σ_{r_{1}} σ_{r_{2}}} \exp [- \frac{1}{2} (\frac{r_{1}^{2}}{σ_{r_{1}}^{2}} + \frac{r_{2}^{2}}{σ_{r_{2}}^{2}}) + 2 πiU r_{1}],$ where $σ_{r_{1}}, σ_{r_{2}}, U$ are filter parameters. The convolution of a texture image and Gabor filter extracts the edges of given frequency and orientation range. The whole filter set is obtained by four dilatations and six rotations of the function $g (r)$ . The filter set is designed so that Fourier transformations of filters cover most of the image spectrum; see Manjunath and Ma (Citation1996) for details. Finally, given a single spectral image with values $Y_{r, j}, r \in I$ , j = 1 , its Gabor wavelet transform is defined as $W_{kϕ, j} (r_{1}, r_{2}) = \int_{u_{1}, u_{2} \in R} Y_{r, j} g_{kϕ}^{*} (r_{1} - u_{1}, r_{2} - u_{2}) d u_{1} d u_{2},$ where $(\cdot)^{*}$ indicates the complex conjugate, ϕ and k are orientation and scale of the filter. The Gabor features are defined as the mean $μ_{j}$ and the standard deviation $σ_{j}$ of the magnitude of filter responses W. The Gabor features of colour images have been computed on grey images or each spectral plane separately and concatenated to form a feature vector. The distance between feature vectors is measured by $L_{1 σ}$ metric, where each feature is normalized by its standard deviation (estimated from all datasets) before $L_{1}$ metric is computed.

Another extension of the Gabor filters to colour textures by Jain and Healey (Citation1998) is based on adding a chromatic antagonism, while Gabor filters themselves to model the spatial antagonism. Opponent Gabor's features consist of the monochrome part of features: $η_{kϕ, j} = \sqrt{\sum_{r} W_{kϕ, j}^{2} (r)},$ where $W_{kϕ, j}$ is the response to the Gabor filter of orientation ϕ and scale k, j is $j -$ th spectral plane of the image. The opponent part of features is: $ψ_{kϕ ϕ^{'}, jℓ} = \sqrt{\sum_{r} {(\frac{W_{kϕ, j} (r)}{η_{kϕ, j}} - \frac{W_{k ϕ^{'}, ℓ} (r)}{η_{k ϕ^{'}, ℓ}})}^{2}},$ for all spectral planes $j, ℓ$ with $j \neq ℓ$ and $| ϕ - ϕ^{'} | \leq 1$ . (Opponent features could also be expressed as the correlation between spectral plane responses.) The distance between feature vectors is measured by $L_{2 σ}$ metric, where each feature is normalized by its standard deviation (estimated from all datasets) before $L_{2}$ metric is computed.

3.3. Local binary patterns

Local Binary Patterns (Ojala et al., Citation2002) are histograms of texture micro patterns. For each pixel, a circular neighbourhood around the pixel is sampled, P is the number of samples, and R is the circle's radius. The sampled point values are thresholded by the central pixel value and the pattern number is formed: (21) $LB P_{P, R} = \sum_{s = 0}^{P - 1} sgn (Y_{s} - Y_{c}) 2^{s},$ (21) where sgn is the sign function, $Y_{s}$ is the grey value of the sampled pixel, and $Y_{c}$ is the grey value of the central pixel. Subsequently, the histogram of patterns is computed. Because of the thresholding, the features are invariant to any monotonic grey-scale change. The multiresolution analysis is done by growing the circular neighbourhood size. However, complex patterns do not have enough occurrences in a texture. Therefore, uniform LBP features LBP $^{u}$ comprise only a subset of patterns. All LBP histograms were normalized to have unit $L_{1}$ norm. As the authors suggested, the similarity between LBP feature vectors is measured using Kullback-Leibler divergence. We have tested features LBP $_{8, 1 + 8, 3}$ , which are combination of features with radii 1 and 3, and uniform version LBP $_{16, 2}^{u}$ on radius 2. They were computed either on gray images or each spectral plane of the colour image and concatenated. LBP features exist in various modifications: (Ahonen et al., Citation2009; Fu & Wei, Citation2008; Heikkilä et al., Citation2009; Khellah, Citation2011; Liao et al., Citation2009; Nanni et al., Citation2012; Zhang et al., Citation2010), but they have similar behaviour; hence we chose two of their variants as representatives of the whole group as any comparison cannot be considered an exhaustive investigation without the LBP strategy.

3.4. Discussion

We excluded fashionable neural net features due to their uncompetitiveness on often restricted test data in practical applications. They need to be more understood, wasteful, dependent on the net topology, and thus cannot be regarded as well-defined textural features. Moreover, we use only one to six training images, which are insufficient for neural net robust learning. The MRF features outperformed deep Convolutional Neural Networks (CNN) in the bark recognition problem even on extensive training data, as demonstrated in Remeš and Haindl (Citation2019). This result is understandable because MRF features are descriptive, while neural net features are discriminative. Similar results were presented in the extensive comparison of the multilayer NN marble textures classification with 17 LBP feature variants and three key-point texture descriptors types in Sidiropoulos et al. (Citation2021). In their results, the CNN features never outperform all these alternatives. Another comparison where NN – ScatNet, PCANet, FV-AlexNet, and RandNet do not outperform LBP features on Outex, CUReT, ALOT, and KTHTIPS data can be consulted in Liu et al. (Citation2016). However, it would be interesting to include tests with a low number of training samples, which would reveal the robustness of features to various conditions as performed in Burghouts and Geusebroek (Citation2009), Vácha et al. (Citation2011).

shows sizes of feature vectors of compared textural features. The fastest features to compute are colour histogram and LBP, followed by five times slower 2DCAR and nine times slower both Gabor and opponent Gabor features.

Table 1. The sizes of feature vectors of compared textural features.

Display Table

4. Experiments

We tested the scale sensitivity of the selected textural features on two databases:

(i)	University of East Anglia (UEA) Uncalibrated Image Database was introduced by Finlayson et al. (Citation2000) and consists of patterns under different illumination spectra,
(ii)	wooden BTF measurements from the extensive UTIA BTF database, introduced by Haindl et al. (Citation2015), compose of material images under varying illumination directions.

In both experiments, all images were scaled down to $95 %, 90 %, 85 %, \dots, 50 %$ of their original size, and regions with the exact resolution were cropped. Consequently, the image of scale 50% covers double the size of the original texture image, but with half of the details than scale 100% (see examples in and ). The training set contains only images with original scales, and the classification accuracy was tested for all scales separately.

Figure 5. The appearance of patterns from the UEA database with varying scales, from the left, the scale factor is 50%, 60%, 75%, 90%, and 100%.

Figure 11. The appearance of two veneers from the Wood UTIA BTF database in varying scales, from the left, the scale factor is 50%, 60%, 75%, 90%, and 100%.

Training images per each material were randomly selected from the training set, and the remaining images were classified using the Nearest Neighbour (1-NN) classifier. The number of training images went from 1 to 6, and the results were averaged over $10^{3}$ of random selections of training images. Even single training samples were randomly selected so they could have different illumination conditions for each material, making recognition more challenging.

4.1. University of East Anglia uncalibrated image database

The UEA dataset contains 28 textile designs captured with six different devices (4 colour cameras and two colour scanners), and images for cameras were illuminated with three different illumination spectra, which sums up to 394 images in total. Examples of images are displayed in , variations of their appearance in , and different scales in . UEA images are supposed to include even non-linear relations of their values caused by different processing in acquisition devices (gamma correction), and no light calibration was performed; see (Finlayson et al., Citation2000) for more details. Since the UEA database images include some scale variations, we have corrected this and rescaled the images to have the same scale and resolution. In total, we used 4312 images with $332 \times 275$ resolution.

Figure 3. Examples of patterns included the UEA database.

Figure 4. The appearance of patterns from the UEA database with varying illumination spectra (3 columns on the left) and additional acquisition devices (2 columns from the right).

summarizes recognition accuracy for the compared features. The 2D CAR features are superior for all test numbers of random training images per material. The classification accuracy of 2D CAR-KL averaged over all scale variations goes from 48.4% for one training sample to 70.2% for six training samples per class. The standard deviation is less than 4 for one training sample and less than 3 for six training samples for all features. The 2D CAR model achieved slightly better results without the Karhunen-Loève transformation, however, we include 2D CAR-KL for more detailed analysis since it had better classification accuracy for experiments reported in Haindl and Vacha (Citation2015). Moreover, the 2D CAR-KL model on K = 4 levels of the Gaussian pyramid achieved lower accuracy than the standard K = 5 levels because the images have sufficient resolution. The only comparable features are opponent Gabor features that achieved similar performance as 2D CAR-KL with slightly lower accuracy for one training sample. Colour histograms suffer from their sensitivity to colour changes, resulting in low performance (from 15.3% to 32.6%). Even though the colour histograms are robust to scale variation (because they do not describe spatial relations) they are unable to recognize materials under different illumination spectra. LBP features did not perform satisfactorily as well.

Table 2. Classification accuracy [%] averaged over all scales and illumination conditions on the UEA dataset.

Display Table

Detailed comparison of scale variation is displayed in , where we can see as classification accuracy significantly increases if scales of training and test samples are closer to each other. The only exception is colour histogram features, which cannot recognize the same materials on the same scale due to insufficient robustness to the illumination spectra changes (regardless of the number of random training samples per class). The 2D CAR-KL features again achieved the best results. Classification accuracy for one training sample starts at 23.1% for half scale factor and goes to 64.9% for the same scale (15% better than alternative features). Opponent Gabor features were slightly better for the highest difference in the scale factor. A similar situation applies to six training samples, where classification accuracy goes from 36.9% to 90.0% for 2D CAR-KL features. However, opponent Gabor features performed better with a significant difference in scale factor (0.5 – 0.7).

Figure 6. The illustration of the classification accuracy [%] progresses with decreasing scale differences among training and test sets (UEA dataset). On the left, for one training sample, and on the right, for six training samples per class.

4.1.1. Across scales

shows classification accuracy across different training and test scale combinations, with a single training sample averaged over $10^{3}$ of random selections. As expected, classification accuracy decreases with a more considerable difference in scale factors. It is worth noting that the last rows of images in correspond to the 2D CAR-KL and opponent Gabor graphs on the left in . Interestingly, recognition accuracy on the diagonal decreases as the scale factor goes from 0.5 to 1. This decrease may be caused by the fact that images with a scale factor of 0.5 (left column in ) cover a larger area of the original material (although being subsampled), so they contain comprehensive information, and the extracted features can be more discriminative.

Figure 7. The classification accuracy [%] for all combinations of scales among training and test sets on the UEA dataset, one training sample per class was used.

This discrepancy is more apparent in , which compares the original training on a scale factor of 1 and training on a scale swapped to a factor of 0.5. We can see a 15% difference in recognition accuracy between training on a scale of 1 and 0.5 for 2D CAR-KL and opponent Gabor features when the test scale is the same as the training scale. The difference in recognition accuracy decreases with the test scale moving further from the training scale, and there is no difference for training on a scale factor of 0.5 with a test scale factor of 1 or vice versa.

Figure 8. Classification accuracy [%] on the UEA dataset with one training sample, on the left for the training sample with the scale factor of 1 and on the right with the scale factor of 0.5.

4.2. Wood UTIA BTF database

This study's Wood UTIA BTF database contains veneers from sixty-five varied European, African, and American wood species, examples of images are shown in . The UTIA BTF databaseFootnote¹ was measured using the high-precision robotic gonioreflectometer described in Haindl et al. (Citation2012), which consists of independently controlled arms with a camera and light. Its parameters, such as angular precision of 0.03 degrees, the spatial resolution of 1000 DPI, or selective spatial measurement, classify this gonioreflectometer as a state-of-the-art device. They measured each wood sample in 81 viewing positions times 81 illumination positions, resulting in 6561 images per sample, 4 TB of data. Because of substantial storage requirements, we took only images for one camera position (top view), and we selected 35 from 81 illumination directions (1 image with a tilt of 0 degrees, 12 images with 30 deg, ten images with 60 deg, and 12 images with 75 deg). The images uniformly represent the space of possible illumination directions; see example images in . Images with different scales are displayed in . In total, we used 25,025 images with $816 \times 802$ resolution.

Figure 9. Examples of wood veneers included the Wood UTIA BTF database.

Figure 10. The illustration of the appearance of four veneers from the Wood UTIA BTF database in varying illumination directions. The left column is illuminated from the surface normal, and the direction of illumination tilt increases to the right: 0, 30, 60, 60, and 75 degrees, illumination azimuth is 0, 90, 180, 252, and 345 degrees, respectively.

summarizes recognition accuracy for the compared features. The 2D CAR features are superior for all test numbers of random training images per material. The classification accuracy of 2D CAR-KL averaged over all scale variations goes from 45.4% for one training sample to 69.4% for six training samples per class. The standard deviation is less than 4 for one training sample and less than 2 for all features. The 2D CAR-KL model on K = 4 levels of the Gaussian pyramid achieved lower accuracy than the standard K = 5 levels as the images have sufficient resolution. The best alternative is opponent Gabor features. However, their accuracy is more than 20% points lower than 2D CAR-KL. Neither colour histogram nor LBP features performed satisfactorily since the recognition accuracy is less than 13.3% and 25.6% for 1 and 6 training samples, respectively. This is because binarized LBP micropatterns are sensitive to illumination direction, as confirmed by Vácha and Haindl (Citation2012). Additionally, authors (Haindl & Vácha, Citation2017) show that LBP features are susceptible to even minor scale variations.

Table 3. Classification accuracy [%] averaged over all scales and illumination angles on the Wood UTIA BTF dataset.

Display Table

The detailed comparison of scale variation is displayed in . The classification accuracy increases as the scales of training and test samples are closer (except for histogram features). The best results were again achieved by 2D CAR-KL features, where classification accuracy for one training sample starts at 22.7% for half scale and goes to 60.9% for the same scale. This improvement is more than 10% better than the opponent Gabor features for all scale factors. A similar situation holds for six training samples, where classification accuracy goes from 31.5% to 91.0% for 2D CAR-KL features, again more than 10% better than opponent Gabor features.

Figure 12. Classification accuracy [%] progresses with decreasing scale differences among training and test sets (Wood UTIA BTF). On the left, for one training sample, and on the right, for six training samples per class.

4.2.1. Across scales

shows classification accuracy across different training and test scale combinations with a single training sample (averaged over $10^{3}$ of random selections). As expected, classification accuracy decreases with a more considerable difference in scale factors. It is worth noting that the last rows of images in correspond to the 2D CAR-KL and opponent Gabor graphs on the left in . Similarly to the UEA dataset, recognition accuracy on the diagonal decreases as a scale factor goes from 0.5 to 1. The reason is that images with a scale factor of 0.5 (left column in ) contain more comprehensive information, and the extracted features are more discriminative.

Figure 13. The classification accuracy [%] for all combinations of scales among training and test sets on the Wood UTIA BTF dataset, one training sample per class was used.

The comparison between the original training on scale factor of 1 and training on scale swapped to factor of 0.5 is displayed in . The difference in recognition accuracy between training on a scale of 1 and 0.5 is 9% for 2D CAR-KL, and 6% for opponent Gabor features. Again, the difference in recognition accuracy decreases with the test scale moving further from the training scale.

Figure 14. Classification accuracy [%] on the Wood UTIA BTF dataset with one training sample, on the left for the training sample with the scale factor of 1 and on the right with scale factor of 0.5.

4.2.2. Illumination tilt

The additional experiment utilizes different illumination angles in Wood UTIA BTF and splits classification accuracy for different illumination tilts. The single training sample was fixed to the illumination from a normal surface direction (0-degree tilt), and the remaining images were classified. The classification accuracy is averaged for each illumination tilt: 30, 60, and 75 degrees (12, 10, and 12 images). Training and test sets have the same scaling factor: 1. The results are displayed in , where classification accuracy decreases as illumination direction moves further from training sample illumination. The last column's average results roughly correspond to the left graph in for test scale factor 1. As the scale variation is absent, the results of LBP and opponent Gabor features are comparable.

Table 4. Classification accuracy [%] is shown for different illumination tilts (declination angle from the surface normal) without any scale variation (Wood UTIA BTF).

Display Table

5. Conclusion

The results indicate that Markovian illumination invariant texture features (2D CAR), based on the Markovian descriptive model, are the most robust textural features for realistic texture classification under natural conditions when learning and classifying textures differ in scale and illumination properties. The 2D CAR features outperformed alternative tested textural features, i.e. the Gabor, opponent Gabor, variants of LBP, or colour histogram texture features. 2D CAR statistical features are analytically derived from the underlying descriptive textural model and can be efficiently, recursively, and adaptively learned. Their additional advantage is their numerically robust estimation. The method's correct recognition accuracy improvements are between 27% and 44%, compared to the LBP features and up to 25% compared to the opponent Gabor features (the second-best alternative). The worst are colour histograms, with an accuracy decrease between 35% and 43%. The colour Markovian textural features were also successfully applied elsewhere in recognition of wood veneers using a smartphone camera, reported by Haindl and Vacha (Citation2015), or tree taxonomy categorization based on bark or coniferous tree needles as shown in Remeš and Haindl (Citation2019). The presented results apply to recognition with bounded scale variation. The full-scale-invariant textural features should be considered for extreme expected scale variation. However, fully invariant features usually lose some discriminability. Thus, each application must carefully balance invariance with expected variability and discriminability.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The Czech Science Foundation project GAČR 19-12340S supported this research.

Notes

1 http://btf.utia.cas.cz/

References

Ahonen, T., Matas, J., He, C., & Pietikainen, M. (2009). Rotation invariant image description with local binary pattern histogram fourier features. In SCIA (pp. 61–70). Springer-Verlag, Berlin Heidelberg
Google Scholar
Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3479–3487). IEEE
Google Scholar
Bovik, A. (1991). Analysis of multichannel narrow-band filters for image texture segmentation. IEEE Transactions on Signal Processing, 39(9), 2025–2043. https://doi.org/10.1109/78.134435
Web of Science ®Google Scholar
Burghouts, G. J., & Geusebroek, J. -M. (2009). Material-specific adaptation of color invariant features. Pattern Recognition Letters, 30(3), 306–313. https://doi.org/10.1016/j.patrec.2008.10.005
Web of Science ®Google Scholar
Finlayson, G., Schaefer, G., & Tian, G. (2000). The UEA uncalibrated colour image database. Technical Report SYS-C00, School of Information System, University of East Anglia, Norwich, UK.
Google Scholar
Fu, X., & Wei, W. (2008). Centralized binary patterns embedded with image Euclidean distance for facial expression recognition. In Natural Computation, 2008. ICNC '08. Fourth International Conference on Vol. 4, (pp. 115–119).
Google Scholar
Gibert, X., Patel, V. M., & Chellappa, R. (2015). Material classification and semantic segmentation of railway track images with deep convolutional neural networks. In 2015 IEEE International Conference on Image Processing (ICIP) (pp. 621–625). IEEE.
Google Scholar
Grigorescu, S. E., Petkov, N., & Kruizinga, P. (2002). Comparison of texture features based on gabor filters. IEEE Transactions on Image Processing, 11(10), 1160–1167. https://doi.org/10.1109/TIP.2002.804262
PubMed Web of Science ®Google Scholar
Haindl, M. (2012). Visual data recognition and modeling based on local markovian models. In L. Florack, R. Duits, G. Jongbloed, M. C. Lieshout, and L. Davies (Eds.), Mathematical methods for signal and image analysis and representation, volume 41 of Computational imaging and vision, chapter 14, (pp. 241–259). Springer London. https://doi.org/10.1007/978-1-4471-2353-8_14
Google Scholar
Haindl, M. (2023). Bidirectional texture function modeling, chapter 28, (pp. 1023–1064). Springer International Publishing, Gewerbestrasse 11, 6330 Cham, Switzerland.
Google Scholar
Haindl, M., & Filip, J. (2013). Visual texture. Advances in computer vision and pattern recognition. Springer-Verlag London.
Google Scholar
Haindl, M., Filip, J., & Vávra, R. (2012). Digital material appearance: The curse of tera-bytes. ERCIM News, 90, 49–50.
Google Scholar
Haindl, M., Mikeš, S., & Kudo, M. (2015). Unsupervised surface reflectance field multi-segmenter. In G. Azzopardi and N. Petkov, (Eds.), Computer analysis of images and patterns, Vol. 9256 of Lecture notes in computer science, (pp. 261 – 273). Springer International Publishing.
Google Scholar
Haindl, M., & Vacha, P. (2015). Wood veneer species recognition using Markovian textural features. In G. Azzopardi and N. Petkov, (Eds.), Computer analysis of images and patterns, volume 9256 of Lecture notes in computer science, (pp. 300–311). Springer International Publishing.
Google Scholar
Haindl, M., & Vácha, P. (2017). Scale sensitivity of textural features. In C. Beltrán-Castañón, I. Nyström, and F. Famili (Eds.), Progress in pattern recognition, image analysis, computer vision, and applications: 21st Iberoamerican Congress, CIARP 2016, Lima, Peru, November 8–11, 2016, Proceedings, Vol. 10125 of Lecture notes in computer science, (pp. 84 – 92). Gewerbestrasse 11, Cham, CH-6330, Switzerland. Springer International Publishing AG.
Google Scholar
Han, J., & Ma, K.-K. (2007). Rotation-invariant and scale-invariant gabor features for texture image retrieval. Image and Vision Computing, 25(9), 1474–1481. https://doi.org/10.1016/j.imavis.2006.12.015
Web of Science ®Google Scholar
Heikkilä, M., Pietikäinen, M., & Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern Recognition, 42(3), 425–436. https://doi.org/10.1016/j.patcog.2008.08.014
Web of Science ®Google Scholar
Hlaing, C. S., & Zaw, S. M. M. (2018). Tomato plant diseases classification using statistical texture feature and color feature. In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) (pp. 439–444). IEEE.
Google Scholar
Jain, A. K., & Healey, G. (1998). A multiscale representation including opponent color features for texture recognition. IEEE Transactions on Image Processing, 7(1), 124–128. https://doi.org/10.1109/83.650858
PubMed Web of Science ®Google Scholar
Khellah, F. (2011). Texture classification using dominant neighborhood structure. IEEE Transactions on Image Processing, 20(11), 3270–3279. https://doi.org/10.1109/TIP.2011.2143422
PubMed Web of Science ®Google Scholar
Li, Z., Liu, G., Jiang, H., & Qian, X. (2009). Image copy detection using a robust gabor texture descriptor. In Proceedings of the First ACM Workshop on Large-scale Multimedia Retrieval and Mining, LS-MMRM '09 (pp. 65–72). New York, NY, USA. ACM.
Google Scholar
Liao, S., Law, M. W. K., & Chung, A. C. S. (2009). Dominant local binary patterns for texture classification. IEEE Transactions on Image Processing, 18(5), 1107–1118. https://doi.org/10.1109/TIP.2009.2015682
PubMed Web of Science ®Google Scholar
Liu, L., Chen, J., Fieguth, P., Zhao, G., Chellappa, R., & Pietikainen, M. (2018). A survey of recent advances in texture representation. arXiv preprint arXiv:1801.10324.
Google Scholar
Liu, L., Fieguth, P., Wang, X., Pietikäinen, M., & Hu, D. (2016). Evaluation of lbp and deep texture descriptors with a new robustness benchmark. In European Conference on Computer Vision, (pp. 69–86). Springer.
Google Scholar
Luo, Q., Su, J., Yang, C., Silven, O., & Liu, L. (2022). Scale-selective and noise-robust extended local binary pattern for texture classification. Pattern Recognition, 132, 108901. https://doi.org/10.1016/j.patcog.2022.108901
Web of Science ®Google Scholar
Manjunath, B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 837–842. https://doi.org/10.1109/34.531803
Web of Science ®Google Scholar
Muzaffar, A. W., Riaz, F., Abuain, T., Abu-Ain, W. A. K., Hussain, F., Farooq, M. U., & Azad, M. A. (2023). Gabor contrast patterns: A novel framework to extract features from texture images. IEEE Access, 11, 60324–60334. https://doi.org/10.1109/ACCESS.2023.3280053
Web of Science ®Google Scholar
Nanni, L., Lumini, A., & Brahnam, S. (2012). Survey on {LBP} based texture descriptors for image classification. Expert Systems with Applications, 39(3), 3634–3641. https://doi.org/10.1016/j.eswa.2011.09.054
Web of Science ®Google Scholar
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Web of Science ®Google Scholar
Randen, T., & Husøy, J. H. (1999). Filtering for texture classification: A comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(4), 291–310. https://doi.org/10.1109/34.761261
Web of Science ®Google Scholar
Remeš, V., & Haindl, M. (2019). Bark recognition using novel rotationally invariant multispectral textural features. Pattern Recognition Letters, 125, 612–617. https://doi.org/10.1016/j.patrec.2019.06.027
Web of Science ®Google Scholar
Roy, S. K., Bhattacharya, N., Chanda, B., Chaudhuri, B. B., & Ghosh, D. K. (2018). Fwlbp: A scale invariant descriptor for texture classification. arXiv preprint arXiv:1801.03228.
Google Scholar
Santini, S., & Jain, R. (1999). Similarity measures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(9), 871–883. https://doi.org/10.1109/34.790428
Web of Science ®Google Scholar
Shivashankar, S., Kudari, M., & Hiremath, P. S. (2018). Galois field-based approach for rotation and scale invariant texture classification. International Journal of Image, Graphics and Signal Processing (IJIGSP), 10(9), 56–64. https://doi.org/10.5815/ijigsp
Google Scholar
Shu, X., Pan, H., Shi, J., Song, X., & Wu, X.-J. (2022). Using global information to refine local patterns for texture representation and classification. Pattern Recognition, 131, 108843. https://doi.org/10.1016/j.patcog.2022.108843
Web of Science ®Google Scholar
Sidiropoulos, G. K., Ouzounis, A. G., Papakostas, G. A., Sarafis, I. T., Stamkos, A., & Solakis, G. (2021). Texture analysis for machine learning based marble tiles sorting. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0045–0051). IEEE.
Google Scholar
Simon, P., & Uma, V. (2018). Review of texture descriptors for texture classification. In Data Engineering and Intelligent Computing, (pp. 159–176). Springer.
Google Scholar
Stricker, M. A., & Orengo, M. (1995). Similarity of color images. Vol. 2420, (pp. 381–392). SPIE.
Google Scholar
Vácha, P., & Haindl, M. (2012). Texture recognition using Robust Markovian features. In Salerno, E., Çetin, A., and Salvetti, O. (Eds.), Computational intelligence for multimedia understanding, volume 7252 of lecture notes in computer science, (pp. 126–137). Springer Berlin/Heidelberg. https://doi.org/10.1007/978-3-642-32436-9_11
Google Scholar
Vácha, P., & Haindl, M. (2022). Textural features sensitivity to scale and illumination variations. In Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., and Krótkiewicz, M., (Eds.), Advances in computational collective intelligence Vol. 1653 of Communications in computer and information science (pp. 237–249), Gewerbestrasse 11, Cham, CH-6330, Switzerland. Springer International Publishing.
Google Scholar
Vácha, P., Haindl, M., & Suk, T. (2011). Colour and rotation invariant textural features based on Markov random fields. Pattern Recognition Letters, 32(6), 771–779. https://doi.org/10.1016/j.patrec.2011.01.002
Web of Science ®Google Scholar
Varma, M., & Zisserman, A. (2009). A statistical approach to material classification using image patch exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 2032–2047. https://doi.org/10.1109/TPAMI.2008.182
PubMed Web of Science ®Google Scholar
Veerashetty, S., & Patil, N. B. (2020). Novel lbp based texture descriptor for rotation, illumination and scale invariance for image texture analysis and classification using multi-kernel svm. Multimedia Tools and Applications, 79(15–16), 9935–9955. https://doi.org/10.1007/s11042-019-7345-6
Web of Science ®Google Scholar
Yang, P., Zhang, F., & Yang, G. (2018). Fusing dtcwt and lbp based features for rotation, illumination and scale invariant texture classification. IEEE Access, 6, 13336–13349. https://doi.org/10.1109/ACCESS.2018.2797072
Web of Science ®Google Scholar
Zhang, B., Gao, Y., Zhao, S., & Liu, J. (2010). Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor. IEEE Transactions on Image Processing, 19(2), 533–544. https://doi.org/10.1109/TIP.2009.2035882
PubMed Web of Science ®Google Scholar

Texture recognition under scale and illumination variations

ABSTRACT

1. Introduction