245
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Application of Unsupervised Feature Selection in Cashmere and Wool Fiber Recognition

, ORCID Icon, , &
 

ABSTRACT

Suitable features are the key to identifying cashmere and wool fibers, and feature selection is an important step in classification. Existing supervised feature selection methods need to consider the information between fiber features and class labels. Aiming at making up for this deficiency, we propose an unsupervised feature selection method based on k-means clustering, which overcome the difficulty that fiber feature class labels are either unavailable or costly to obtain. Firstly, the subset of fiber features that have been normalized are clustered by the k-means clustering algorithm to obtain the total number of clusters, and the clustering effect is evaluated by the DB Index criterion. Next, the DB value of each feature subset, the correlation of features and the total number of the clustering are considered as the judgment criteria to select the optimal feature subset. Finally, the optimal subset of features obtained by unsupervised feature selection algorithms is fed into a support vector machine for automatic identification and classification of the two fibers. The experimental results show that the method achieves a high recognition rate of 97.25%. It is verified that the unsupervised feature selection method based on k-means clustering is effective for the recognition of cashmere and wool.

摘要

合适的特征是识别羊绒和羊毛纤维的关键,特征选择是分类的重要步骤. 现有的监督特征选择方法需要考虑纤维特征和类别标签之间的信息. 为了弥补这一不足,我们提出了一种基于k-均值聚类的无监督特征选择方法,该方法克服了纤维特征类标签不可用或获取成本高的困难. 首先,通过k-均值聚类算法对已归一化的纤维特征子集进行聚类,得到聚类总数,并通过DB Index准则评估聚类效果. 接下来,将每个特征子集的DB值、特征的相关性和聚类总数作为选择最优特征子集的判断标准. 最后,将无监督特征选择算法获得的最优特征子集输入到支持向量机中,用于两种纤维的自动识别和分类. 实验结果表明,该方法的识别率高达97.25%. 验证了基于k均值聚类的无监督特征选择方法对羊绒和羊毛的识别是有效的.

Highlights

  • The unsupervised feature selection algorithm is applied in the process of cashmere and wool fiber recognition, which overcomes the difficulty that class tags cannot be obtained or the cost is high.

  • We propose an unsupervised feature selection algorithm based on K-means clustering based on the influence of features on classification results and correlation analysis among features.

  • The K-means clustering algorithm was used to determine the optimal classification number of each cashmere and wool feature subset, and then a judgment function was set based on the DB Index criterion for feature selection. Finally, one of the features with greater relevance was deleted from the selected feature subset to achieve the purpose of feature selection.

  • Cluster analysis based on distance and similarity is used as the core of feature selection algorithm to reduce redundancy and achieve the purpose of dimension reduction.

  • The proposed algorithm preserves as much as possible the original results of the sample data and the original information of the sample data. It is also applicable to other unsupervised data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Credit contribution statement

Yaolin Zhu: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Writing-original draft; Writing-review & editing. Xingze Wang: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Visualization; Writing-original draft; Writing-review & editing. Meihua Gu: Resources; Validation. Gang Hu:Formal analysis; Resources; Supervision; Validation. Wenya Li: Resources; Supervision.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was support by the natural science basic research key program funded by Shaanxi Provincial Science and Technology Department (No. 2022JZ-35 and No. 2023-JC-ZD-33), the key research program industrial textiles Collaborative Innovation Center Project of Shaanxi Provincial Department of education(No. 20JY026) and Science and Technology plan project of Yulin City (No.CXY-2020-052).