Full article: A Human-In-One-Loop Active Domain Adaptation Framework for Digit Recognition

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Domain adaptation can effectively enhance a model’s performance on target domain data with limited data. However, when some target domain labels are obtainable, training the model with both source and target domain data simultaneously can lead to worse performance due to the lower density of target domain data. If a large amount of target domain data is labeled without discrimination, it will necessitate a considerable expenditure of human resources. To address this issue, this paper proposes a human-in-one-loop active domain adaptation framework based on Target Domain Feature Generation to solve the problems. The oracle participates in only one iteration of data labeling, and a target domain classifier will take over the subsequent rest iterations. An image generator based on multiple CycleGANs forms an iterative co-training mechanism, which can continuously generate more high-quality labeled fake target domain data in iterations to improve the performance of the target domain classifier. The Top-N labeled data selection method with high confidence is devised to select the most accurately predicted data for labeling, reducing manual labeling workload. This framework can achieve an average accuracy of 0.8869 on six domain pairs, doubling the classical domain adaptation method DSN, requiring only a small amount of manual labeling.

Introduction

The issue of digital classification is widely present in various application scenarios, such as meter recognition, road sign recognition, and invoice recognition. In a controlled experimental setting, these tasks achieve relatively good performance (Xiu et al. Citation2022; Zhang et al. Citation2022). However, there is often a discrepancy between the training data and the data from the actual production environment (Zhou et al. Citation2023). Domain adaptation is an important branch of transfer learning, suitable for situations where the target domain and the source domain only differ in data, without altering the task at hand (Zhao et al. Citation2020), and domain adaptation mainly achieves knowledge transfer based on shared features between the source and target domains, which can effectively improve the model’s performance on the target domain data even when training data is scarce (Cao et al. Citation2021).

Recently, the Unsupervised Data Adaptation (UDA) method has effectively improved generalization capabilities on the target domain by aligning distributions, but the model is unable to learn class discriminant boundary (Li, Li, and Yu Citation2024; Saito et al. Citation2019; Tan and Zheng Citation2023). In practical applications, it is feasible to label a small amount of target domain data. However, when both source and target domain data are used for training the model simultaneously, and some target domain labels are available, the small sample density in the target domain can affect the extraction of source domain features. This can lead to the model not always improving performance as expected and, in some cases, can even result in a decline in transfer performance (Saito et al. Citation2019; Zhou et al. Citation2023). Even if we can obtain instance data from the target domain, fully labeling the target domain data would require a significant amount of manpower, which is unrealistic in practical applications.

The previous method aimed to reduce the distribution shift between the source and target domains by applying adversarial learning and minimizing statistical discrepancies. This allowed the classifier trained on the source domain to be directly applied to the target domain (Ge et al. Citation2023). However, label mismatch was a significant issue faced by the previous method. Due to architectural limitations, the small sample size in the target domain and the difference in density between the source and target domains made it challenging for the model, which was based on adversarial learning, to effectively distinguish between different data in the source and target domains (Zhao et al. Citation2020).

To address the issues mentioned above, this paper proposes a human-in-one-loop active domain adaptation framework based on Target Domain Feature Generation (TDFG), as shown in . The approach of using image style transfer was adopted to generate target domain images with correct labels. In this process, a certain amount of data was needed to learn the image style of the target domain. To achieve this, a human-in-the-loop method (HITL) was involved. HITL allows the model to better utilize unlabeled target domain data while improving performance. By incorporating human feedback, the model can refine its understanding of the target domain, thereby enhancing its ability to classify target domain data accurately.

Figure 1. The framework of human-in-one-loop active domain adaptation framework based on Target domain Feature Generation (TDFG). TDFG aims to enhance model performance on target domain data by involving human input in the data labeling process. By utilizing a co-training mechanism with an image generator and target domain classifier, high-quality labeled fake target domain data is continuously generated to improve classification accuracy.

Existing human-in-the-loop method faces the challenge of obtaining a reliable model using as few human resources as possible (Gómez-Carmona et al. Citation2024). To select high-value data for model correction, some past methods have focused more on the selection strategy (Konyushkova, Sznitman, and Fua Citation2017; Saito et al. Citation2019). In contrast, in terms of data annotation, the Target Domain Feature Generation (TDFG) approach adopted the Top-N labeling data selection method with high confidence (TopNHC), as shown in , which involves annotating target domain samples with high-confidence predictions from the classifier. Through multiple rounds of training of high-confidence samples, the model achieved better performance in experiments.

Furthermore, to further reduce the amount of data annotation, the TDFG method utilized a group of image generators, as shown in the top part of . By using a target domain image generator, it learned the features of target domain images and generated more target domain images. The TDFG method enhanced the model’s performance on the target domain by annotating fewer critical data points, thus reducing the consumption of human resources. This approach allows for an efficient use of human expertise, the human annotator in , focusing on annotating the most informative examples that contribute the most to improving the model’s performance on the target domain.

Here is a summary of the entire framework: 1) To achieve human-in-one-loop, this framework is designed into two stages. The human annotator only needs to participate in the first stage of data labeling. In contrast, the second stage of data labeling will be undertaken by a target domain classifier with guaranteed performance. 2) To obtain robust migration performance, alleviating the phenomenon of label mismatch, an image generator based on multiple CycleGANs, which is the red component in , is used to form an iterative training mechanism with a target domain classifier. In each iteration, the classifier, which is the green component in , will make predictions on the target domain data and select a portion of labeled target domain data and source domain data to form the training data for the image generator. The trained image generator will generate more, and higher-quality labeled fake target domain data to improve the performance of the target domain classifier. The first iteration is the first stage of the framework, whose data labeling is done by the trained source domain classifier, the blue component in , and requires manual correction of mislabeling. The subsequent iterations are the second stage of the framework, whose data labeling is entirely undertaken by the trained target domain classifier of the corresponding iteration. 3) To reduce the workload of manual labeling and to ensure the reliability of autonomous labeling in the second stage, this paper proposes a TopNHC method. Thus TDFG provides more reliable autonomous labels to ensure a virtuous cycle of iterative training of the target domain classifier and image generator.

The contribution of this paper is listed as follows. 1) We propose a human-in-one-loop active domain adaptation framework based on target domain feature generation (TDFG), which can achieve high-performance migration and only require a small amount of manual labeling in one iteration. 2) The image generator creates the target domain feature, which guarantees the quantity and quality of the generated data and minimizes the labeling workload. 3) We design a Top-N labeled data selection method with High Confidence (TopNHC), which can efficiently select valuable samples, providing a new and optional approach to achieving human-in-one-loop and reducing the manual labeling workload. 4) The TDFG framework can achieve the average classification accuracy of 0.8869 when facing six sets of data domain pairs, which is almost double the classical domain adaptation method DSN.

Related Works

Deep Domain Adaptation

Deep domain adaptation (Wang and Deng Citation2018) is to achieve knowledge transfer based on deep learning models, which can be generally divided into the following categories:

Discrepancy-based Deep Domain Adaptation. Such methods mainly focus on reducing the feature differences between domains so that the feature extractor can learn the shared features between different domains. Tzeng et al. (Citation2014) proposed the Deep Domain Confusion (DDC) network, which uses Maximum Mean Discrepancy (MMD) (Borgwardt et al. Citation2006) to quantify inter-domain distances, and the MMD is added to the loss function of the network. This way, the feature differences between the source and target domain can be reduced. Based on DDC, joint MMD (Long et al. Citation2017), joint probability MMD (Zhang and Wu Citation2020), The Weighted MMD (Yan et al. Citation2017), and other measures have emerged to improve the performance of discrepancy-based deep domain adaptation continuously. Recently, the concept that multiple source domains represent a variety of distributions is a common scenario in practical applications. Through multi-source domain adaptation, the effectiveness of models in real-world scenarios can be significantly enhanced. Guided by this principle, Cheng et al. (Citation2024) proposed the Deep Joint Semantic Adaptation Network to address the challenge of migrating from multiple source domains to a single target domain. This approach involves Joint Semantic Maximum Mean Discrepancy, a new maximum mean discrepancy-based metric, to uniformly optimize the cross-domain joint distribution of category-corresponded subdomains on multiple task-specific layers.
Generative Adversarial Networks (GAN) (Goodfellow et al. Citation2020) inspire adversarial-based Deep Domain Adaptation. Ganin et al. (Citation2016) first applied the adversarial concept to domain adaptation by proposing Domain Adversarial Neural Networks (DANN). The feature extractor aims to extract shared features that can improve the performance of the task classifier, and the domain classifier aims to distinguish whether the extracted features belong to the source or target domain, which forms an adversarial mechanism to improve the migration performance of the model. Wang et al. (Citation2019) added attention mechanism to DANN by arguing that not all regions in an image are worth migrating, so the images are divided into different sub-regions, and the domain classifier is decomposed into different sub-domain classifiers based on the regions, and only the worth parts are migrated. Bousmalis et al. (Citation2017) proposed PixelDA, a domain adaptation method based on pixel space generation, which processes the source domain data to be similar to the target domain data by adversarial generation and trains the recognition model directly on the processed source domain data.
Reconstruction-based Deep Domain Adaptation. The idea of this method is to encode and then reconstruct the input data to ensure that the encoding process extracts the shared features and ensures that the features unique to each domain still exist after reconstruction simultaneously. Ghifary et al. (Citation2016) used AutoEncoder (AE) (Lore, Akintayo, and Sarkar Citation2017) to extract features common to the source and target domains and then input them into a decoder to reconstruct the target domain samples, ensuring that the model can extract shared features while preserving some information specific to the target domain, thus improving the performance of the model for the target domain. Ghifary et al. (Citation2015) focus on domain adaptation under multitasking, using an encoder corresponding to multiple decoders to reconstruct data from different domains. Bousmalis et al. (Citation2016) discussed domain-shared features and domain-private features separately and limited the difference between domain-shared features and domain-private features to be large enough to extract domain-shared features effectively using the adversarial mechanism of feature extractor and domain classifier while making the integration of extracted domain-shared features and domain-private features can reconstruct the original data to ensure that no features are missed. In relation to this, in the area of handwritten digit generation, Anwar et al. (Citation2021) proposed a new framework. This involves collecting user-written character shapes and then, based on prior knowledge, calculating local peaks from horizontal and vertical projection functions. The inclusion of human participation significantly enhances the speed at which numerical digits are generated.

Recently, Tan and Zheng (Citation2023) introduced a method for active learning in object detection. In contrast to the approach described in Tan and Zheng (Citation2023), our method focuses on object classification. It is important to note that object classification and object detection exhibit distinct performances in a human-in-the-loop (HITL) setting. Unlike the method proposed by Tan and Zheng (Citation2023), the Target Domain Feature Generation (TDFG) does not train directly on unlabeled data from the target domain. Instead, it concentrates on learning the data of the target domain through image style transfer. This process involves using source domain data as input and generating new data in the style of the target domain, thereby creating accurate pairs of target domain data and labels. Li, Li, and Yu (Citation2024) proposed an inter-domain mixup approach, employing a cross-domain alignment strategy that incorporates label information into model adaptation. This additional supervision aids in cross-domain feature alignment and reduces label mismatch issues. Contrary to Li, Li, and Yu (Citation2024), TDFG employs a generator for producing target domain data, rather than using adversarial training. The generator group in TDFG simultaneously learns features from both the target and source domains, performing style transfer on target domain data to generate samples that are stylistically aligned with the target domain.

Human-In-The-Loop and Active Domain Adaptation

The purpose of human-in-the-loop (HITL) (Budd, Robinson, and Kainz Citation2021; Wu et al. Citation2022) is to improve the performance of machine learning algorithms and enhance human productivity through the interaction between humans and machine learning algorithms. Active learning and data selection are the foundations of human-in-the-loop. In active learning, it aims to optimize the model performance with a limited amount of labeled data (Ren et al. Citation2021). For an unlabeled dataset, the oracle will select a small portion of samples for manual labeling in an iteration. After each labeling iteration, the model will be trained and tested for performance. This process will be repeated until the model performance reaches the desired level. In data selection, depending on the practical scenarios, the data selection methods in active learning vary, and there are standard selection methods based on uncertainty, diversity, representativeness, reducing expected errors, maximizing expected label changes, and so on (Fu, Zhu, and Li Citation2013).

The performance of migration models can also be improved if labels are provided to the target domain data by human-in-the-loop. Thus, many works have combined active learning with domain adaptation, and the concept of active domain adaptation was born. Mayer and Timofte (Citation2020) used the idea of adversarial generation in active learning to generate many uncertainty samples to be labeled to participate in model training. Su et al. (Citation2020) based on DANN (Ganin et al. Citation2016) network, proposed to select and provide labels for target domain data that are difficult to accurately predict by classifiers based on classifier prediction entropy and domain discriminator cross-entropy, these data are considered to be less similar to the source domain data. Saito et al. (Citation2018) used finetuning (Yosinski et al. Citation2014) as a migration method to design a tri-training data labeling method to enrich the target domain labeling: two independent source domain classifiers are used to predict the target domain data, and the labeling is considered accurate if the predictions of the two source domain classifiers agree. Prabhu et al. (Citation2021) proposed the Clustering Uncertainty-weighted Embeddings (CLUE) data selection method for active domain adaptation, which uses uncertainty-weighted clustering to identify target data for labeling that are both uncertain under the model and diverse in feature space. Ma, Gao, and Xu (Citation2021) proposed the Clustered Non-Transferable Gradient Embedding (CNTGE) strategy that uses the principles of transferability, diversity, and uncertainty to select and label target domain data.

Rangwani et al. (Citation2021) proposed an ensemble-based information criterion that selects uncertain, diverse, and representative target domain data by sensing other samples in a target domain subset and labeling them. To effectively improve the performance of machine models, Wu et al. (Citation2023) proposed a HITL-based approach that augmented the capabilities of autonomous driving systems. HITL has also been demonstrated to facilitate more efficient model compression to suit inference environments. Wang et al. (Citation2023) introduced a human-in-the-loop method for the automatic generation of deep neural networks for mobile applications. Gómez-Carmona et al. (Citation2024) conducted a quantitative study on HITL approaches, highlighting the importance of considering human factors in the design of more effective and flexible collaborative systems between humans and machines. Tan and Zheng (Citation2023) introduced an active learning and semi-supervised approach to fully leverage unlabeled data, aiming to enhance the accuracy of deep object detection.

TDFG integrates human-in-the-loop with image style transfer, achieving superior performance while reducing labor costs. By utilizing this approach, TDFG effectively improves the performance of classification models and minimizes the consumption of human labor resources. This advancement contributes significantly to the related work in the field, demonstrating the potential of combining HITL strategies with advanced data augmentation techniques to address the challenges in machine learning applications.

Related Work Summary

The mentioned works above have creatively designed many transfer learning techniques and methods to improve migration performance and integrated domain adaptation and human-in-the-loop to promote the development of related fields. Recent methods have effectively extracted domain-invariant features by using semi-supervised approaches and minimizing the differences across domains. However, there is still considerable room for improvement when it comes to addressing the imbalance between the number of source domain samples and target domain samples, as well as the epistemic uncertainty, according to Kendall and Gal (Citation2017), caused by the lack of target domain data. Furthermore, most of the active learning methods mentioned above focus only on data selection methods and have not yet recognized the feasibility of image style transfer in reducing labor costs. We designed TDFG to address these overlooked issues, employing image transfer for data augmentation to effectively alleviate the problem of data imbalance. Additionally, we utilized the TopNHC method to reduce the labor cost of data annotation.

Methodology

Define the source domain as $D_{S} = \{X_{S}, P_{S}\}$ , and the target domain as $D_{T} = \{X_{T}, P_{T}\}$ , where $X$ denotes the data space, and $P$ denotes the distribution of the corresponding data. Define the source domain task as $T_{S} = \{Y_{S}, f_{S} (\cdot)\}$ and the target domain task as $T_{T} = \{Y_{T}, f_{T} (\cdot)\}$ , where $Y$ denotes the label space and $f (\cdot)$ denotes the mapping from the data space to the label space. Define the source domain data as $X_{S} = {\{(x_{S_{i}}, y_{S_{i}})\}}_{i = 1}^{n_{S}}$ , where $x_{S_{i}} \in X_{S}$ , $y_{S_{i}} \in Y_{S}$ , and the target domain data as $X_{T} = {\{x_{T_{i}}\}}_{i = 1}^{n_{T}}$ , where $x_{T_{i}} \in X_{T}$ . The purpose of this paper is to train a ${\hat{f}}_{T} (\cdot)$ so that it is close enough to the mapping of the target domain data space to the labeling space $f_{T} (\cdot)$ in the function space to achieve the ability to accurately map the target domain data $X_{T}$ to the label space $Y_{T}$ in the presence of less feature commonality between the source and target domain.

Human-In-One-Loop Active Domain Adaptation Framework

This paper proposes an active domain adaptation framework based on Target Domain Feature Generation (TDFG) to solve the problem of poor migration performance with insufficient shared features and label mismatch problems. TDFG framework is designed into two stages to reduce the large, round-by-round manual labeling costs associated with human-in-the-loop. The first stage is active generation domain adaptation, where the human annotator only needs to participate in the first stage of data labeling. The process of the first stage is shown in .

Figure 2. The first stage, active generation domain adaptation. In this stage, a trained source domain classifier will make predictions on the target domain data, and based on the prediction results, a portion of the target domain data is selected using the TopNHC data selection method (cf., section “Top-N labeled data selection method with high confidence”), and a subset of accurately labeled target domain data is obtained by mislabeling correction of the oracle. An image generator is trained using those accurately labeled target domain data and the source domain data. The image generator takes the source domain data as input and generates a large amount of labeled fake target domain data. The labeled fake target domain data and those accurately labeled target domain data are used to train the target domain classifier.

Framework Overview

The framework consists of two main stages: the first stage involves manual labeling by the human annotator, while the second stage utilizes a target domain classifier for autonomous labeling. An image generator based on multiple CycleGANs is employed to generate high-quality labeled fake target domain data, which is used to enhance the performance of the target domain classifier, as shown in the right part of .

Training Details

In the first stage, the unlabeled target domain data $X_{T} = {\{x_{T}^{⟨ i ⟩}\}}_{i = 1}^{n_{T}}$ is predicted using the source domain classifier ${\hat{C}}_{S} (\cdot)$ , which is trained by the source domain data $X_{S} = {\{(x_{S}^{⟨ i ⟩}, y_{S}^{⟨ i ⟩})\}}_{i = 1}^{n_{S}}$ . Then the $n_{true}$ samples with the highest prediction confidence are selected by the TopHNC data selection method (cf., section “Top-N labeled data selection method with high confidence”) and manually corrected to obtain a set of accurately labeled target domain data $X_{T_{true}}^{0} = {\{(x_{T}^{0; ⟨ i ⟩}, y_{T_{true}}^{0; ⟨ i ⟩})\}}_{i = 1}^{n_{true}}$ . This accurately labeled target domain data $X_{T_{true}}^{0}$ and the source domain data $X_{S}$ are used to train the image generator $G_{S \to T}^{0} (\cdot)$ (cf., section “Image generator”). The trained image generator $G_{S \to T}^{0} (\cdot)$ can generate a set of fake target domain data with labels $X_{T_{fake}}^{0} = {\{(x_{T_{fake}}^{0; ⟨ i ⟩}, y_{T_{fake}}^{0; ⟨ i ⟩})\}}_{i = 1}^{n_{fake}}$ based on a set of source domain data. The labeled fake target domain data $X_{T_{fake}}^{0}$ and the accurately labeled target domain data $X_{T_{true}}^{0}$ will be mixed as the training data for the target domain classifier $C_{T}^{0} (\cdot)$ , where the target domain classifier is initialized with the parameters of the trained source domain classifier ${\hat{C}}_{S} (\cdot)$ . EquationEquation (1)(1) $L_{C_{S}} = - \sum_{i = 1}^{n_{S}} y_{S}^{⟨ i ⟩} log C_{S} (x_{S}^{⟨ i ⟩})$ (1) , EquationEquation (2)(2) $L_{C_{T}} = - \sum_{i = 1}^{n_{fake}} y_{T_{fake}}^{⟨ i ⟩} log C_{T} (x_{T_{fake}}^{⟨ i ⟩}) - \sum_{i = 1}^{n_{true}} y_{T_{true}}^{⟨ i ⟩} log C_{T} (x_{T}^{⟨ i ⟩})$ (2) , and EquationEquation (3)(3) $X_{T_{true}}^{0} = {\{(x_{T}^{0; ⟨ i ⟩}, y_{T_{true}}^{0; ⟨ i ⟩})\}}_{i = 1}^{n_{true}} = TopNHC ({\hat{C}}_{S} (\cdot), X_{T}, n_{true})$ (3) respectively define the loss function of the source domain classifier, the loss function of the target domain classifier, and the TopNHC data selection process in the first stage.

(1)

L_{C_{S}} = - \sum_{i = 1}^{n_{S}} y_{S}^{⟨ i ⟩} log C_{S} (x_{S}^{⟨ i ⟩})

(1)

(2)

L_{C_{T}} = - \sum_{i = 1}^{n_{fake}} y_{T_{fake}}^{⟨ i ⟩} log C_{T} (x_{T_{fake}}^{⟨ i ⟩}) - \sum_{i = 1}^{n_{true}} y_{T_{true}}^{⟨ i ⟩} log C_{T} (x_{T}^{⟨ i ⟩})

(2)

(3)

X_{T_{true}}^{0} = {\{(x_{T}^{0; ⟨ i ⟩}, y_{T_{true}}^{0; ⟨ i ⟩})\}}_{i = 1}^{n_{true}} = TopNHC ({\hat{C}}_{S} (\cdot), X_{T}, n_{true})

(3)

In the first stage, the idea of active learning is introduced to provide accurate labels for part of the target domain data so that more private but valuable features in the target domain data can be learned, which lays the foundation for the migration framework to be well applied to domains with less feature commonality. In addition, the image generator can learn the mapping from the source domain to the target domain, generating a more diverse and larger amount of fake target domain data with labels. The participation of these generated data enables the target domain classifier to learn target domain features more adequately and improve its performance and generalization ability. However, the labeled fake target domain data generated by the image generator in a single epoch may suffer from insufficient data volume and low data quality, resulting in the target domain classifier failing to achieve the desired migration performance. Therefore, the second stage of this framework is designed to compose the target domain classifier and the image generator into an iterative training mechanism, as shown in .

Figure 3. The second stage, iterative generation training. The image generator and target domain classifier are designed to form an iterative generation training mechanism to generate more diverse and larger amounts of labeled fake target domain data for training the target domain classifier. In this stage, the target domain classifier with reliable performance directly performs the data labeling work without human involvement. The autonomously labeled data from the current iteration will train the image generator together with the source domain data and the accurately labeled target domain data. The labeled fake target domain data generated by the image generator will train the target domain classifier together with the accurately labeled target domain data. The performance of the image generator and target domain classifier will be continuously improved in iterations.

After training the target domain classifier and the image generator in the first iteration, the framework will enter the iterative generation training stage. The target domain classifier trained in $j$ -th iteration $\hat{C} jT (\cdot)$ will make predictions on the target domain data. Then the TopNHC data selection method is used to select a set of target domain data with high prediction confidence $X_{T_{model}}^{j} = {\{(X_{T}^{j; ⟨ i ⟩}, y_{T_{model}}^{j; ⟨ i ⟩})\}}_{i = 1}^{n_{j}}$ , and directly use the prediction results as the label of this set of data. Then use source domain data $X_{S}$ , selected autonomous labeled target domain data $X_{T_{model}}^{j}$ and accurately labeled target domain data $X_{T_{true}}^{0}$ to train the image generator $G_{S \to T}^{j} (\cdot)$ (cf., section “Image generator”). After the training, the image generator generates a large amount of fake target domain data with labels $X_{T_{fake}}^{j}$ according to the source domain data. The labeled fake target domain data $X_{T_{fake}}^{j}$ and the accurately labeled target domain data $X_{T_{true}}^{0}$ will be used as training data for the target domain classifier in the next iteration. The TopNHC data selection process and the loss function of the target domain classifier in the second stage are respectively shown in EquationEquation (4)(4) $X_{T_{model}}^{j} = {\{(x_{T}^{j; ⟨ i ⟩}, y_{T_{model}}^{j; ⟨ i ⟩})\}}_{i = 1}^{n_{j}} = TopNHC (\hat{C} Tj (\cdot), X_{T}, n_{j})$ (4) and EquationEquation (5)(5) $L_{C_{T}}^{j} = - \sum_{i = 1}^{n_{j}} y_{T_{fake}}^{j; ⟨ i ⟩} log C_{T}^{j} (x_{T_{fake}}^{j; ⟨ i ⟩}) - \sum_{i = 1}^{n_{true}} y_{T_{true}}^{⟨ i ⟩} log C_{T}^{j} (x_{T_{true}}^{⟨ i ⟩})$ (5) .

(4)

X_{T_{model}}^{j} = {\{(x_{T}^{j; ⟨ i ⟩}, y_{T_{model}}^{j; ⟨ i ⟩})\}}_{i = 1}^{n_{j}} = TopNHC (\hat{C} Tj (\cdot), X_{T}, n_{j})

(4)

(5)

L_{C_{T}}^{j} = - \sum_{i = 1}^{n_{j}} y_{T_{fake}}^{j; ⟨ i ⟩} log C_{T}^{j} (x_{T_{fake}}^{j; ⟨ i ⟩}) - \sum_{i = 1}^{n_{true}} y_{T_{true}}^{⟨ i ⟩} log C_{T}^{j} (x_{T_{true}}^{⟨ i ⟩})

(5)

In the second stage, the image generator forms a feedback incentive mechanism with the target domain classifier: the image generator can generate a large number of fake target domain data with reliable labels. The participation of these data can improve the performance of the target domain classifier by providing an increasing number and quality of target domain features. The increasing labeling accuracy of the target domain classifier with improving performance enables the TopNHC data selection method to select high-confidence samples in a more accurately labeled data set, and the reliability of autonomous labeling is thus continuously improved. These target domain data with high-accuracy autonomous labels are in turn used as training data for the image generator to improve its generation performance.

The process of the active domain adaptation framework based on TDFG is shown in Algorithm 1.

Algorithm 1. Active domain adaptation framework based on Target Domain Feature Generation (TDFG).

Display Table

Analysis of Manual Workload

The manual workload cost by TDFG is primarily manifested in the manual correction of incorrect labels. Humans need to review the accurately labeled data, even if no further action is required. However, with the emergence of various data labeling tools, the manual checking of accurately labeled data has become easy and quick, allowing this aspect of the workload to be disregarded. With this method, when deploying a deep learning model, the model parameters of the online system can be updated with less manpower, thereby achieving better performance.

Analysis of Time Complexity

To analyze the time complexity, we provide these annotations. $G$ represents the process related to the image generator, and $C$ the process related to the classifier. The superscript $t$ indicates training, while $i$ indicates inference. The subscript $s$ stands for source, and $t$ for target.

The time complexity for each step are listed as follows. In the first iteration: $O (C_{s}^{t})$ is for training source domain classifier. Then, train the image generator $O (G^{t})$ . Next, predict with $O (C_{s}^{i})$ , as in algo.step.5. Then, generate images $O (G^{i})$ . After that, train $O (C_{t}^{t})$ , as in algo.step.6. This process is repeated in a for-loop $J$ times. The total complexity is $J * (O (C_{t}^{i}) + O (G^{t}) + O (G^{i}) + O (C_{t}^{t}))$ . Since $J$ is a constant number, the complexity simplifies to $O (C_{t}^{i}) + O (G^{t}) + O (G^{i}) \break + O (C_{t}^{t})$ .

$O (C_{t}^{t})$ is equal to $O (w * m * e)$ , where $w$ represents the number of parameters, $m$ the number of training examples, and $e$ the number of training epochs.

$O (G^{t})$ is $O (w * m * T)$ , where $w$ is the number of parameters, $m$ the number of training examples, and $T$ the number of training iterations.

Since inference time is required only once, and the number of training times depends on the batch number, the time complexity of entire pipeline can be simplified to $J * (O (G^{i}) + O (C_{t}^{t}))$ .

Top-N Labeled Data Selection Method with High Confidence

This paper posits a hypothesis: if high-confidence samples contain errors, it indicates significant cognitive issues within the model. This means the model confidently misclassifies samples into the wrong categories. The root of this problem lies in the epistemic uncertainty (Kendall and Gal Citation2017) caused by the scarcity of data in the target domain. By involving human intervention, these confidently misclassified samples can be identified and corrected. In subsequent training rounds, the model, without updating its weights, will still make incorrect predictions on these samples. Consequently, during back-propagation when calculating the loss value, the batch with these errors will result in a larger loss compared to batches without sample errors. This leads to a more effective update of the model’s weights, thereby effectively reducing the model’s cognitive uncertainty and improving its accuracy. Therefore, this paper proposes a new data selection method: the Top-N labeled data selection method with High Confidence (TopNHC), to reduce the amount of manual labeling in the first stage and ensure the reliability of autonomous labels in the second stage. The TopNHC method selects samples predicted by a classifier with the highest confidence. The process is shown in , taking the source domain classifier to predict the target domain data as an example.

Figure 4. Top-N labeled data selection method with high confidence (TopNHC). Take the source domain classifier to label the target domain data as an example. First, after the source domain classifier predicts the target domain data, the prediction result of each sample is a confidence sequence with a length equal to the total number of categories. Subsequently, the confidence sequence corresponding to each sample is maximized, and all the samples are reordered from the largest to the smallest by the maximum confidence. Finally, the top N samples with the highest maximum confidence are taken as the selected data.

TopNHC represents the first round of data screening in the TDFG, predominantly led by the machine. As shown in , after the source domain classifier predicts each target domain sample, a confidence list of length equal to the total number of categories K is obtained, and the TopNHC data selection method uses this list as the basis for data selection. Firstly, the maximum value of each confidence list is recorded. Secondly, the target domain data are sorted from highest to lowest according to this value. Thirdly, the top N target domain data with the highest confidence are selected to form the selected data set. As shown in EquationEquation (6)(6) $X_{T}^{T o pNHC} = sort (\max (softmaxPred ({\hat{C}}_{S} (\cdot), X_{T}))) [: N]$ (6) .

(6)

X_{T}^{T o pNHC} = sort (\max (softmaxPred ({\hat{C}}_{S} (\cdot), X_{T}))) [: N]

(6)

Advantages of TopNHC

The Top-N labeled data selection method with High Confidence (TopNHC) has the following advantages: 1) The TopNHC data selection method saves manual labeling workload because the number of mislabels in the selected data is small. This is because the human annotator only needs to correct a small number of labels that a classifier predicted inaccurately. 2) The TopHNC data selection method can guarantee the reliability of the autonomous labeling in the second stage. High confidence is verified to have a positive correlation with high recognition accuracy. Since the TopNHC data selection method selects data that are predicted with high confidence, the proportion of being accurately predicted is higher among the selected data. 3) The TopNHC data selection method is possible to select confusing data. The data selected by the TopNHC data selection method are the samples predicted by the classifier with high confidence. If there is still a wrong prediction in such samples, these samples are considered to have a high transfer learning value. For example, if a digit 7 is classified as a digit 1 with high confidence, then correcting that mislabel and providing it to a classifier for learning can improve the classifier’s ability to discriminate between digits 1 and 7.

Image Generator

This section describes the input, output, and functions of the image generator, and this section will introduce its principle and implementation.

The image generator is implemented using multiple CycleGANs, each responsible for generating data for a specific category. Multiple CycleGANs are trained separately, with each CycleGAN operating independently. The structure of the CycleGAN, including the number of layers, the number of neurons, and other parameters, is identical to that proposed in the original paper (Zhu et al. Citation2017). The generator aims to translate source domain data into outputs that resemble the target domain data, ensuring that the generated data is indistinguishable from real target domain data. By training the source-to-target domain generator and target-to-source domain generator, the image generator can produce labeled fake target domain data with high quality We have provided the architecture of image generator, CycleGANs in . Each CycleGAN performs learning and inference independently.

For target domain data generation, adversarial generation methods are generally preferred for implementation. However, most adversarial generation methods (Goodfellow et al. Citation2020) require pairs of training data, which are undoubtedly costly to acquire. In contrast, CycleGAN (Zhu et al. Citation2017) is more suitable for domain adaptation than other adversarial generation methods because of its ability to learn mappings from one domain to another without needing paired training data, as shown in Fig. ?. However, CycleGAN learns complete inter-domain mappings and cannot be directly specified to generate a specific category of target domain data. Therefore, to ensure that the generated target domain data are dependably labeled, and the volume of data is balanced across categories, this framework will implement an image generator based on multiple CycleGANs. Under this design, each CycleGAN only acts on one subset of source and target domain data with the same label. Since CycleGAN learns the complete mapping between two domains, the high quality of the generated labeled fake target domain data is still guaranteed even if partially mislabeled target domain samples exist. The process of the image generator designed in this framework is shown in .

Figure 5. Image generator. The image generator is implemented using multiple CycleGANs. First, the source domain data and the labeled target domain data are split and reorganized by category to form the training data for each CycleGAN. Each CycleGAN is responsible for one category of data generation; its training data are the source domain data and the labeled target domain data belonging to the same category. In the inference generation period, its input is the source domain data, and its output is the generated labeled fake target domain data.

Take the $j$ -th iteration as an example, $X_{S}^{j}$ and $X_{T}^{j}$ (including $X_{T_{model}}^{j}$ or $X_{T_{true}}^{0}$ ) respectively denote the source domain data and target domain data with labels. $K$ denotes the total number of categories in the dataset, $D_{T}^{j; k}$ , $D_{S}^{j; k}$ , $G_{S \to T}^{j; k}$ and $G_{T \to S}^{j; k}$ respectively denote the target domain discriminator, source domain discriminator, source-to-target domain generator, and target-to-source domain generator for the $k$ -th CycleGAN (corresponding to category $k$ ). The image generator takes the source domain data $X_{S}^{j}$ and the labeled target domain data $X_{T}^{j}$ as input. After splitting and reorganizing them by category, $K$ sets of data are obtained as the formal input data of the $K$ Cycle-GANs. Taking category $k$ as an example, the training and inference of the $k$ -th CycleGAN are shown on the right panel of .

For target domain data generation, the training paradigm of CycleGAN is to train a source-to-target domain generator so that the target domain discriminator cannot distinguish the generated labeled fake target domain data from the real target domain data. In contrast, after anti-generation by the target-to-source domain generator, the anti-generated target domain data must be similar enough to the original source domain data. The target-to-source domain generator and the source domain discriminator are trained with the same principle and form a complete training process with the former. The process of training the $k$ -th CycleGAN model in the $j$ -th iteration is shown in EquationEquation (7)(7) $V_{T \to S}^{j; k} (G_{T \to S}^{j; k}, D_{S}^{j; k}, X_{s}^{j; k}, X_{T_{model}}^{j; k}) = \sum_{i = 1}^{n_{j}} \{\log D_{S}^{j; k} (x_{S}^{j; k; ⟨ i ⟩}) + \log [1 - D_{S}^{j; k} (G_{T \to S}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩}))]\}$ (7) to EquationEquation (12)(12) $min_{G_{S \to T}^{j; k}} max_{D_{T}^{j; k}} min_{G_{T \to S}^{j; k}} max_{D_{S}^{j; k}} V_{c y lce - g a n}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}, D_{S}^{j; k}, D_{T}^{j; k})$ (12) . Where EquationEquation (7)(7) $V_{T \to S}^{j; k} (G_{T \to S}^{j; k}, D_{S}^{j; k}, X_{s}^{j; k}, X_{T_{model}}^{j; k}) = \sum_{i = 1}^{n_{j}} \{\log D_{S}^{j; k} (x_{S}^{j; k; ⟨ i ⟩}) + \log [1 - D_{S}^{j; k} (G_{T \to S}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩}))]\}$ (7) and EquationEquation (8)(8) $V_{S \to T}^{j; k} (G_{S \to T}^{j; k}, D_{T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}) = \sum_{i = 1}^{n_{j}} \{\log D_{T}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩}) + \log [1 - D_{T}^{j; k} (G_{S \to T}^{j; k} (x_{S}^{j; k; ⟨ i ⟩}))]\}$ (8) describe the value functions of training the target-to-source domain generator, source-to-target domain generator, source domain discriminator, and target domain discriminator. Equation (3.3) constrains the feature similarity between the original data and the result after domain mapping and inverse domain mapping. EquationEquation (11)(11) $V_{c y lce - g a n}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}, D_{S}^{j; k}, D_{T}^{j; k}) = V_{T \to S}^{j; k} + V_{S \to T}^{j; k} + λ L_{s i milar}^{j; k}$ (11) and EquationEquation (12)(12) $min_{G_{S \to T}^{j; k}} max_{D_{T}^{j; k}} min_{G_{T \to S}^{j; k}} max_{D_{S}^{j; k}} V_{c y lce - g a n}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}, D_{S}^{j; k}, D_{T}^{j; k})$ (12) describe the overall value function of the CycleGAN ( $λ$ is the hyperparameter) and the CycleGAN’s adversarial training process.

(7)

V_{T \to S}^{j; k} (G_{T \to S}^{j; k}, D_{S}^{j; k}, X_{s}^{j; k}, X_{T_{model}}^{j; k}) = \sum_{i = 1}^{n_{j}} \{\log D_{S}^{j; k} (x_{S}^{j; k; ⟨ i ⟩}) + \log [1 - D_{S}^{j; k} (G_{T \to S}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩}))]\}

(7)

(8)

V_{S \to T}^{j; k} (G_{S \to T}^{j; k}, D_{T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}) = \sum_{i = 1}^{n_{j}} \{\log D_{T}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩}) + \log [1 - D_{T}^{j; k} (G_{S \to T}^{j; k} (x_{S}^{j; k; ⟨ i ⟩}))]\}

(8)

(9)

L^{*} = L_{s i milar}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k})

(9)

(10)

L^{*} = \sum_{i = 1}^{n_{j}} \{||G_{T \to S}^{j; k} [G_{S \to T}^{j; k} (x_{S}^{j; k; ⟨ i ⟩})] - x_{S}^{j; k; ⟨ i ⟩}|| + ||G_{S \to T}^{j; k} [G_{T \to S}^{j; k} (x_{T_{model}}^{j; k; ⟨ i ⟩})] - x_{T_{model}}^{j; k; ⟨ i ⟩}||\}

(10)

(11)

V_{c y lce - g a n}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}, D_{S}^{j; k}, D_{T}^{j; k}) = V_{T \to S}^{j; k} + V_{S \to T}^{j; k} + λ L_{s i milar}^{j; k}

(11)

(12)

min_{G_{S \to T}^{j; k}} max_{D_{T}^{j; k}} min_{G_{T \to S}^{j; k}} max_{D_{S}^{j; k}} V_{c y lce - g a n}^{j; k} (G_{T \to S}^{j; k}, G_{S \to T}^{j; k}, X_{S}^{j; k}, X_{T_{model}}^{j; k}, D_{S}^{j; k}, D_{T}^{j; k})

(12)

After each iteration of training, using the source domain data of a specific category as input, each CycleGAN’s source-to-target domain generator will generate a large amount (equal to the input source domain data) of fake target domain data with labels (the corresponding category is directly used as labels). Taking the $j$ -th iteration, the $k$ -th trained source-to-target domain generator ${\hat{G}}_{S \to T}^{j; k}$ as an example, the generation process is shown in EquationEquation (13)(13) $x_{T_{fake}}^{j; k} = {\hat{G}}_{S \to T}^{j; k} (x_{S}^{j; k})$ (13) .

(13)

x_{T_{fake}}^{j; k} = {\hat{G}}_{S \to T}^{j; k} (x_{S}^{j; k})

(13)

Domain Classifier

The structure of the Source domain classifier and the Target domain classifier is the same as VGG11, and the number and type of neural network layers have not been modified. The target domain classifier plays a crucial role in the second stage of the framework, where it autonomously labels the target domain data. The classifier is trained iteratively along with the image generator to improve its performance using the generated fake target domain data. The TopNHC is used to select accurately labeled samples, reducing the need for manual intervention and ensuring reliable autonomous labeling.

Co-Training Mechanism

In each iteration, the target domain classifier makes predictions on the target domain data and selects a portion of labeled target domain data and source domain data for training the image generator. The trained image generator then generates more high-quality labeled fake target domain data, which is used to further train and improve the performance of the target domain classifier. This iterative process continues, with the first stage involving manual correction of mislabeling by the source domain classifier and subsequent stages relying on autonomous labeling by the target domain classifier.

Experiments

To verify the extent of manual labeling workload saving and the reliability of the selected labeled data when using the TopNHC, as well as the migration performance of the active domain adaptation framework based on TDFG when there exists less feature commonality between domains, experiments including labeled data selection and domain adaptation are designed for validation.

Four data domains are used in the experiments of this paper, namely MNIST(m), MNIST-M(mm), SVHN(sv), and SYNTH(sy), as shown in .

Figure 6. Four data domains. The task for all four data domains is digit 0–9 classification. Among them, MNIST is the classical handwritten digit dataset with white characters on a black background. MNIST_M is obtained by random coloring based on MNIST. SVHN is the photographed door digits. SYNTH is a dataset consisting of printed digits of different colors placed on disturbed background images.

Among them, the feature differences between MNIST and MNIST-M are minimal (both are handwritten digits); the feature differences between SVHN and SYNTH are also relatively small (both are printed digits); there are partial feature differences between MNIST and SYNTH (differences between handwritten and printed digits), and the feature differences between MNIST and SVHN are significant (differences between handwritten and printed digits with camera interference in SVHN).

For a valid and fair experimental comparison, the experimental framework is set up as follows: The VGG-11 network (Simonyan and Zisserman Citation2014) is used to implement the source domain classifier and the target domain classifier, and the amount of labeled data selected from the target domain in the first stage is 5000. 6 iterations are performed in the second stage, and 500 target domain data are selected in the first iteration, and each subsequent iteration selects 500 more target domain data than the previous iteration. In the first round, we select an additional 500 data groups, which, when combined with the original 5,000 labeled data points, form a training set of 5,500 groups, as shown in . Ten independent CycleGANs are used to implement the image generator.

Data Selection Experiments

To measure the manual labeling workload of various data selection methods, the following data selection methods are used in the first stage of this framework for comparison: Random Sample (RS), Least Confident (LC), High Entropy (EN), and the Top-N labeled data selection method with High Confidence (TopNHC). Five sets of comparison experiments were designed; the variables are the sample size $Nu m_{sample}$ of the selected target domain data, which are 300, 800, 1500, 3000, and 5000. The comparison parameters are the actual labeled sample size $Nu m_{label}$ , and the percentage of manual labeling workload reduction of the TopHNC method compared with other methods $Nu m_{reduce}$ . The results of their comparison experiments are shown in .

Table 1. Compare the manual tagging effort required in the first phase of this framework with the use of different data selection methods. Concretely, five different data selection volumes from 300 to 5000, four data selection methods include Random Sample (RS), least confident (LC), high entropy (EN), and the Top-N labeled data selection method with high confidence (TopNHC), and two source and target domain pairs are MNIST and SVHN, and MNIST and SYNTH are involved in the comparison. The parameters being compared are the actual manual labeling volume $Nu m_{label}$ and the percentage reduction in manual labeling volume of the TopNHC data selection method compared to other data selection methods $Nu m_{reduce}$ .

Display Table

In , according to the $Nu m_{label}$ column, we can see that the TopNHC data selection method can achieve the minimum manual labeling amount regardless of the size of samples and the group of data for which the model is migrated. According to the $Nu m_{reduce}$ column, it can be seen that the TopNHC data selection method can save at most half of the manual labeling amount compared with other data selection methods, and at least more than a hundred data can avoid manual labeling. Therefore, it can be proved that the TopNHC data selection method can effectively reduce the manual labeling workload without being affected by the migrated data.

In the second stage of the framework, this paper proposes to replace manual labeling with autonomous labeling, so that manual involvement for data labeling is no longer required in this stage, further saving the manual workload. To verify the reliability of autonomous labeling, this paper compares the TopNHC data selection method with the three data selection methods mentioned above. Using two source and target domain data pairs, the framework is entirely run, and the accuracy of autonomous labeling achieved in each training iteration in the second stage is compared. The comparison results are shown in .

Table 2. Comparison of the autonomous labeling accuracy can be achieved using different data selection methods in the second stage. Concretely, the four data selection methods include Random Sample (RS), least confident (LC), high entropy (EN), and the Top-N labeled data selection method with high confidence (TopNHC), six iteration rounds from 5500 to 8000 total data selection volume, and two sets of source and target domain pairs including MNIST and SVHN, MNIST and SYNTH are involved in the comparison. The parameters being compared are the actual manual labeling volume $Nu m_{label}$ and the accuracy achieved by autonomous labeling $AC C_{r}$ .

Display Table

As shown in , the TopNHC data selection method can achieve the highest autonomous labeling accuracy in the two pairs of source and target domains. In particular, in the migration of MNIST to SVHN, the TopNHC data selection method can reach an average autonomous labeling accuracy of at least 20% points higher than the other three data selection methods. In addition, the TopNHC data selection method can guarantee an autonomous labeling accuracy higher than 0.95, i.e., it can guarantee that the autonomous labels provided by the classifier can have high reliability in the second stage.

Migration Performance Experiments

Seven sets of comparison experiments were designed to verify the migration performance of the active domain adaptation framework based on Target Domain Feature Generation. The variables include four migration models, DSN (Bousmalis et al. Citation2016), PiexlDA (Bousmalis et al. Citation2017), VGG-11, and active domain adaptation framework based on Target Domain Feature Generation (TDFG), and three types of training data: source domain data ( $D_{S}$ ), source domain data mixed with the manually labeled target domain samples in the first stage ( $D_{ST}$ ), and source domain data mixed with the manually labeled target domain data in the first stage and the labeled fake target domain data generated in the first iteration ( $D_{STF}$ ). The target domain classification accuracies of various models and data types on six different pairs of data domains are compared. shows the results of each group of model experiments.

Table 3. Comparison of migration performance of different domain adaptation methods. Four domain adaptation methods, including VGG-11, PiexlDA, DSN, and the active domain adaptation framework based on Target domain Feature Generation (TDFG), three forms of training data including source domain data ( $D_{S}$ ), source domain data mixed with the manually labeled target domain samples in the first stage ( $D_{ST}$ ), and source domain data mixed with the manually labeled target domain data in the first stage and the labeled fake target domain data generated in the first iteration ( $D_{STF}$ ), and six sets of source and target domain pairs including MNIST and SVHN, MNIST and SYNTH, MNIST_M and SVHN, SVHN and MNIST, SVHN and MNIST_M, SYNTH and MNIST_M are involved in the comparison. The parameter to be compared is the migration performance of the models.

Display Table

Comparing the first three columns of individually, the performance of all models migrating from the source domain MNIST to the target domain SVHN with less feature commonality suffers a certain degree of performance degradation compared to migrating to the target domain SYNTH, whose features are less different from those of the source domain data MNIST. DSN( $D_{S}$ ) is the most severe, with almost 30% degradation. This further illustrates that many domain adaptation methods that only extract shared features between domains may experience significant performance degradation when facing domains with less feature commonality.

According to , comparing the performance of TDFG and the traditional migration algorithm DSN( $D_{S}$ ) separately, it can be seen that the migration performance of TDFG is higher than that of DSN for each domain pair, especially for domain pairs with less feature commonality such as MNIST to SVHN and MNIST-M to SVHN, the migration performance of TDFG is more than twice that of DSN( $D_{S}$ ). It can be proved that the TDFG framework solves the severe degradation of migration performance for domain pairs. According to , comparing the performance of PiexlDA( $D_{ST}$ ) and PiexlDA( $D_{STF}$ ) separately, it can be seen that the labeled fake target domain data generated by the image generator are helpful for the improvement of the migration performance: in 5 cases out of 6 different data domain pairs, the migration model achieves higher migration performance thanks to the addition of the labeled fake target domain data. This can also be demonstrated by comparing the performance of DSN( $D_{ST}$ ) and DSN( $D_{STF}$ ) alone or VGG-11( $D_{ST}$ ) and VGG-11( $D_{STF}$ ) alone.

In the second stage of this framework, the iterative generation mechanism contributes to the joint performance improvement of the image generator and the target domain classifier. Since TDFG is a classifier implemented by VGG-11, according to , the performance of VGG-11 ( $D_{ST}$ ), VGG-11 ( $D_{STF}$ ), and TDFG are compared separately, and the accuracy achieved by the target domain classifier in the TDFG framework is greater than that of the VGG-11 network trained without iterative generation. This proves that the iteration generation mechanism incorporated in the second stage of our framework helps improve the performance of the target domain classifier.

Overall comparing the migration performance of various models in , the TDFG migration framework proposed in this paper can achieve the highest or second highest migration performance on various domain pairs, so it can be proved that, as an innovative domain adaptation framework, TDFG has good migration performance and has a specific practical application value.

To verify that our framework has both high migration performance and low manual labeling amount, we compare the amount of manual labeling when different domain adaptation methods use three data selection methods for active learning to achieve similar performance as TDFG, as shown in .

Table 4. Comparison of manual labeling amount at a similar performance. Four domain adaptation methods, including VGG-11, PiexlDA, DSN, and the active domain adaptation framework based on target domain feature generation (TDFG), four data selection methods include random sample (RS), least confident (LC), high entropy (EN), and the Top-N labeled data selection method with high confidence (TopNHC), and six sets of source and target domain pairs including MNIST and SVHN, MNIST and SYNTH, MNIST_M and SVHN, SVHN and MNIST, SVHN and MNIST_M, SYNTH and MNIST_M are involved in the comparison. The parameters to be compared include the samples that need to be labeled and the percentage of savings or increase in labeling workload compared to the TopNHC data selection method.

Display Table

According to , domain adaptation methods that use other data selection methods require more manual labeling to achieve the same performance. When using other methods to select the target domain data to be labeled, a maximum of 20 times more manual labeling is required to achieve the same level of migration performance as the TDFG framework. The only exception is the use of the random sampling method to select the target domain data to be labeled to improve the migration performance of the DSN model from SYNTH to MNIST_M. However, it only requires 21.7% less manual labeling than the TopNHC method, and overall, it is the TopNHC method that has the more significant potential to reduce the manual labeling workload. Therefore, it can be proved that the TopNHC data selection method in the TDFG framework can effectively reduce the manual labeling cost.

As a data selection method that efficiently selects the most accurately labeled data, the Top-N labeled data selection method with High Confidence (TopNHC) not only improves the migration performance of the model but also promotes a significant reduction of manual labeling workload on these datasets. In addition, the results of the experimental data indicate that the human-in-one-loop approach is effective. It also complements the active domain adaptation framework based on Target Domain Feature Generation (TDFG) so that the human could choose to participate in one iteration of manual data labeling. Moreover, the TopNHC data selection method can be considered simple and effective with practical application value.

In the experimental comparisons, we can see that TDFG can largely overcome the issue of label mismatch and can also to some extent narrow the performance gap between models under fully labeled target domain data. In this framework, the labeled fake target domain data generated by the image generator and the iterative generation training mechanism are beneficial for improving migration performance. Therefore, our framework is proven to have good migration performance and operational implementation value.

Conclusion and Future Work

This paper delves into the practicalities of domain adaptation, introducing an innovative human-in-the-loop domain adaptation framework termed Active Domain Adaptation based on Target Domain Feature Generation (TDFG). By incorporating active learning, it introduces the Top-N High Confidence (TopNHC) labeled data selection strategy, significantly reducing the manual labeling burden. This strategy not only labels a portion of the target domain data but also enhances the pool of learnable target domain features. An iterative training process is employed to boost migration performance.

In contrast to prior human-in-the-loop techniques, our framework significantly slashes manual labeling efforts, necessitating oracle involvement in just a single iteration of manual data labeling. Against conventional DA methods, including DSN and PixelDA, TDFG demonstrates marked performance enhancements across various domain pairs.

In real-world application scenarios, such as meter reading, models are typically expected to deliver exceptional performance with minimal manual effort. The engineers who solve problems are often dealing with similar issues for the first time. TDFG is a one-loop method, which means TDFG is valuable for new player who lack data across multiple scenarios. From the manual effort aspect, TDFG only requires manual involvement for mislabeling correction in the second stage. Besides, due to the improved performance of the target domain classifier and the property of TopNHC to select samples with high confidence, the manual workload can still be relatively low.

From the aspect of HITL and active learning, we think that the data selection method should generally be designed according to the application scenario. The innovation of TopNHC, and the key to its effectiveness, lies in its strategy to select data that the model predicts with a high degree of confidence, as opposed to choosing random or marginally challenging samples. The TopNHC method uses high-confidence samples for labeling because subsequent training can effectively correct the model’s erroneous perceptions. Moreover, the image generation part of the TDFG framework complements the above data to some extent and therefore improves the performance of the target domain classifier. Therefore, it can be considered that the TopNHC data selection method is highly bundled with the TDFG framework. When using the TopNHC data selection method in other human-in-the-loop machine learning scenarios, it is prudent to design a framework that can take advantage of the TopNHC data selection method.

There exist some limitations of the proposed TDFG framework. The application of CycleGAN in categorization tasks exhibits notable limitations, primarily due to its operational design that necessitates running distinct CycleGANs for different categories. This approach inherently leads to a linear increase in the manual effort required for label annotation as the number of categories escalates. Such a methodology is especially burdensome in tasks involving a vast array of categories, where the manual effort required for label assignment can become significantly extensive. Furthermore, the necessity for each category to undergo separate CycleGAN training introduces additional complexities. As the quantity of task categories surges, the training costs associated with CycleGAN networks escalate correspondingly. This increase in training requirements not only burdens computational resources but also results in a sluggish execution generation across multiple CycleGANs, presenting a critical bottleneck in the efficiency of the overall process.

The desirable future work of the TDFG framework can be listed as follows. Firstly, on the premise that the human-in-one-loop domain adaptation can be achieved using the TopNHC data selection method (the second stage of the TDFG framework does not require human involvement for data labeling), other data selection methods can be used in the first stage of the TDFG framework, which will slightly increase the workload of human labeling but is likely to reduce the number of iteration rounds in the second stage and accelerate the performance improvement of the target domain classifier due to their higher representativeness and diversity. Secondly, the components in the TDFG framework are low-coupled. Thus, many lightweight and efficient style migration methods can be used to replace the image generator implemented with CycleGAN, which has the potential to bring superior migration performance to the framework. Thirdly, a method to automatically measure precisely how many iterations should be performed in the second stage deserves to be investigated, addressing the possible adverse effects of manually specifying the number of iterations.

Supplemental material

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Supplementary Material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/08839514.2024.2349410

Additional information

Funding

This work is supported by National Natural Science Foundation of China [U22A20106] Foshan Science and technology innovation special foundation [BK22BF001] and Foshan Higher Education Advanced Talents Foundation [BKBS202203].The data that support the findings of this study are available in MNIST at DOI:10.1109/MSP.2012.2211477. These data were derived from the following resources available in the public domain: https://yann.lecun.com/exdb/mnist/.

References

Anwar, S., B. Mehrban, M. Ali, F. Hussain, and Z. Halim. 2021. A novel framework for generating handwritten datasets. Multimedia Tools and Applications 80 (6):9657–29. doi:10.1007/s11042-020-09545-7
Web of Science ®Google Scholar
Borgwardt, K. M., A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. J. Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22 (14):e49–57. doi:10.1093/bioinformatics/btl242
PubMed Web of Science ®Google Scholar
Bousmalis, K., N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. 2017. Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 3722–31.
Google Scholar
Bousmalis, K., G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. 2016. Domain separation networks, Advances in Neural Information Processing Systems, Barcelona, Spain, 29.
Google Scholar
Budd, S., E. C. Robinson, and B. Kainz. 2021. A survey on active learning and human-in- the-loop deep learning for medical image analysis. Medical Image Analysis 71:102062. doi:10.1016/j.media.2021.102062
PubMed Web of Science ®Google Scholar
Cao, Z., Y. Zhou, A. Yang, and S. Peng. 2021. Deep transfer learning mechanism for fine- grained cross-domain sentiment classification. Connection Science 33 (4):911–28. doi:10.1080/09540091.2021.1912711
Web of Science ®Google Scholar
Cheng, Z., S. Wang, D. Yang, J. Qi, M. Xiao, and C. Yan. 2024. Deep joint semantic adaptation network for multi-source unsupervised domain adaptation. Pattern Recognition 151:110409. doi:10.1016/j.patcog.2024.110409
Web of Science ®Google Scholar
Fu, Y., X. Zhu, and B. Li. 2013. A survey on instance selection for active learning. Knowledge and Information Systems 35 (2):249–83. doi:10.1007/s10115-012-0507-8
Web of Science ®Google Scholar
Ganin, Y., E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, and V. Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 (1):2096–2030.
Google Scholar
Ge, C., R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang. 2023. Domain adaptation via prompt learning. IEEE Transactions on Neural Networks and Learning Systems.
Web of Science ®Google Scholar
Ghifary, M., W. B. Kleijn, M. Zhang, and D. Balduzzi. 2015. Domain generalization for object recognition with multi-task autoencoders. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2551–59.
Google Scholar
Ghifary, M., W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li. 2016. Deep reconstruction- classification networks for unsupervised domain adaptation. European Conference on Computer Vision, Amsterdam, The Netherlands, 597–613.
Google Scholar
Gómez-Carmona, O., D. Casado-Mansilla, D. López-de Ipiña, and J. García-Zubia. 2024. Human-in-the-loop machine learning: Reconceptualizing the role of the user in interactive approaches. Internet of Things 25:101048. doi:10.1016/j.iot.2023.101048
Google Scholar
Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2020. Generative adversarial networks. Communications of the ACM 63 (11):139–44. doi:10.1145/3422622
Web of Science ®Google Scholar
Kendall, A., and Y. Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. Advances in Neural Information Processing Systems, Long Beach, California, USA, 30.
Google Scholar
Konyushkova, K., R. Sznitman, and P. Fua. 2017. Learning active learning from data, Advances in Neural Information Processing Systems, Long Beach, California, USA, 30.
Google Scholar
Li, J., G. Li, and Y. Yu. 2024. Inter-domain mixup for semi-supervised domain adaptation. Pattern Recognit 146:110023. Retrieved from. doi:10.1016/j.patcog.2023.110023
Web of Science ®Google Scholar
Long, M., H. Zhu, J. Wang, and M. I. Jordan. 2017. Deep transfer learning with joint adaptation networks. International Conference on Machine Learning, Sydney, Australia, 2208–17.
Google Scholar
Lore, K. G., A. Akintayo, and S. Sarkar. 2017. Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61:650–62. doi:10.1016/j.patcog.2016.06.008
Web of Science ®Google Scholar
Ma, X., J. Gao, and C. Xu. 2021. Active universal domain adaptation. Proceedings of the IEEE/cvf International Conference on Computer Vision, Montreal, Quebec, Canada, 8968–77.
Google Scholar
Mayer, C., and R. Timofte. 2020. Adversarial sampling for active learning. Proceedings of the IEEE/cvf Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 3071–79.
Google Scholar
Prabhu, V., A. Chandrasekaran, K. Saenko, and J. Hoffman. 2021. Active domain adaptation via clustering uncertainty-weighted embeddings. Proceedings of the IEEE/cvf International Conference on Computer Vision, Montreal, Quebec, Canada, 8505–14.
Google Scholar
Rangwani, H., A. Jain, S. K. Aithal, and R. V. Babu, 2021. S3vaada: Submodular subset selection for virtual adversarial active domain adaptation. Proceedings of the IEEE/cvf International Conference on Computer Vision, Montreal, Quebec, Canada, 7516–25.
Google Scholar
Ren, P., Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, B. B. Gupta, and X. Wang, X. Wang. 2021. A survey of deep active learning. ACM Computing Surveys (CSUR) 54 (9):1–40. doi:10.1145/3472291
Web of Science ®Google Scholar
Saito, K., D. Kim, S. Sclaroff, T. Darrell, and K. Saenko. 2019. Semi-supervised domain adaptation via minimax entropy. Proceedings of the ieee/cvf international conference on computer vision, Seoul, South Korea, 8050–58.
Google Scholar
Saito, K., K. Watanabe, Y. Ushiku, and T. Harada. 2018. Maximum classifier discrepancy for unsupervised domain adaptation. Proceedings of the ieee conference on computer vision and pattern recognition, Salt Lake City, Utah, USA, 3723–32.
Google Scholar
Simonyan, K., and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv: 14091556.
Google Scholar
Su, J.-C., Y.-H. Tsai, K. Sohn, B. Liu, S. Maji, and M. Chandraker. 2020. Active adversarial domain adaptation. Proceedings of the ieee/cvf winter conference on applications of computer vision, Snowmass, CO, USA, 739–48.
Google Scholar
Tan, F., and G. Zheng. 2023. Active learning for deep object detection by fully exploiting unlabeled data. Connection Science 35(1): doi:10.1080/09540091.2023.2195596.
Web of Science ®Google Scholar
Tzeng, E., J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv Preprint arXiv: 14123474.
Google Scholar
Wang, M., and W. Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312:135–53. doi:10.1016/j.neucom.2018.05.083
Web of Science ®Google Scholar
Wang, X., L. Li, W. Ye, M. Long, and J. Wang. 2019. Transferable attention for domain adaptation. Proceedings of the aaai conference on artificial intelligence, Honolulu, Hawaii, USA, Vol. 33, 5345–52.
Google Scholar
Wang, Y., Z. Yu, S. Liu, Z. Zhou, and B. Guo. 2023. Genie in the model: Automatic generation of human-in-the-loop deep neural networks for mobile applications. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7 (1):1–29. doi:10.1145/3580815
Google Scholar
Wu, J., Z. Huang, Z. Hu, and C. Lv. 2023. Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving. Engineering 21:75–91. doi:10.1016/j.eng.2022.05.017
Web of Science ®Google Scholar
Wu, X., L. Xiao, Y. Sun, J. Zhang, T. Ma, and L. He. 2022. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems 135:364–81. doi:10.1016/j.future.2022.05.014
Web of Science ®Google Scholar
Xiu, H., J. He, X. Zhang, L. Wang, and Y. Qi. 2022. Hrc-mcnns: A hybrid regression and classification multibranch cnns for automatic meter reading with smart shell. IEEE Internet of Things Journal 9 (24):25752–66. doi:10.1109/JIOT.2022.3197930
Web of Science ®Google Scholar
Yan, H., Y. Ding, P. Li, Q. Wang, Y. Xu, and W. Zuo. 2017. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. Proceedings of the ieee conference on computer vision and pattern recognition, Honolulu, Hawaii, USA, 2272–81.
Google Scholar
Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, 27.
Google Scholar
Zhang, H., B. Dong, Q. Zheng, and B. Feng. 2022. Research on fast text recognition method for financial ticket image. Applied Intelligence 52 (15):18156–66. doi:10.1007/s10489-022-03467-7
Web of Science ®Google Scholar
Zhang, W., and D. Wu. 2020. Discriminative joint probability maximum mean discrepancy (djp-mmd) for domain adaptation. 2020 international joint conference on neural networks (ijcnn), Glasgow, United Kingdom, 1–8.
Google Scholar
Zhao, S., X. Yue, S. Zhang, B. Li, H. Zhao, B. Wu, R. Krishna, J. E. Gonzalez, A. L. Sangiovanni-Vincentelli, S. A. Seshia, et al. 2020. A review of single-source deep unsupervised visual domain adaptation. IEEE Transactions on Neural Networks and Learning Systems 33 (2):473–93. doi:10.1109/TNNLS.2020.3028503
Web of Science ®Google Scholar
Zhou, F., G. Wang, K. Zhang, S. Liu, and T. Zhong. 2023. Semi-supervised anomaly detection via neural process, IEEE Transactions on Knowledge and Data Engineering.
Google Scholar
Zhu, J.-Y., T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the ieee international conference on computer vision, Venice, Italy, 2223–32.
Google Scholar

A Human-In-One-Loop Active Domain Adaptation Framework for Digit Recognition

ABSTRACT

Introduction

Related Works

Deep Domain Adaptation

Human-In-The-Loop and Active Domain Adaptation

Related Work Summary

Methodology

Human-In-One-Loop Active Domain Adaptation Framework

Framework Overview

Training Details

Algorithm 1. Active domain adaptation framework based on Target Domain Feature Generation (TDFG).

Analysis of Manual Workload

Analysis of Time Complexity

Top-N Labeled Data Selection Method with High Confidence

Advantages of TopNHC

Image Generator

Domain Classifier

Co-Training Mechanism

Experiments

Data Selection Experiments

Migration Performance Experiments

Conclusion and Future Work

subfig.sty

1psingleauthorgroup.pdf

pdfwidgets.sty

stage2without.pdf

elstest-3pd.pdf

natbib.sty

rotating.sty

stage2.pdf

graph1.eps

elstest-5p.pdf

elstest-3p.pdf

elstest-1pdoubleblind.pdf

interactapasample.bib

booktabs.sty

graph2.eps

elsdoc.pdf

jfigs.pdf

stage1.eps

rvdtx.sty

datasets.png

HIOL___R3___journal_response_letter.pdf

epsfig.sty

interact.cls

Disclosure Statement

Supplementary Material

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

HIOL_R3_journal_response_letter.pdf