1,044
Views
0
CrossRef citations to date
0
Altmetric
Biomedical Engineering

Automated detection of leukemia in blood microscopic images using image processing techniques and unique features: Cell count and area ratio

&
Article: 2304484 | Received 12 Jul 2023, Accepted 08 Jan 2024, Published online: 23 Jan 2024

Abstract

Leukemia is a type of cancer that affects the body’s blood-forming tissue, where the bone marrow produces an excessive amount of abnormal white blood cells (WBCs) that do not function properly. The diagnosis of leukemia is typically done by a trained expert who visually observes unique features and determines the type of cancer. However, digital image processing techniques have been improving in the healthcare system, particularly in diagnosing different types of diseases and helping doctors make treatment decisions. This paper presents a system for detecting leukemia in blood microscopic images and classifying them as normal or abnormal (with leukemia) automatically. Two main techniques were used: counting the number of WBCs around red blood cells (RBCs) and measuring the average area of WBCs around a bounding box around each cell. The classification accuracy was calculated at 91.7 and 88.8% for the two techniques, respectively. These techniques can be used as features in machine learning applications, and the system presented is faster and more efficient than traditional diagnostic processes used in hospitals.

1. Introduction

Human blood is made up of plasma and three types of cells: white blood cells (WBCs), red blood cells (RBCs), and platelets. Each of these cells performs a specific function. WBCs help the body fight against infections and diseases; RBCs transport oxygen from the lungs to the body’s tissues and vice versa; and platelets assist in blood clotting and control bleeding. Leukemia is a type of cancer that affects WBCs within the blood. When a person has leukemia, their body produces an excessive amount of one type of blood cell and not enough of another, resulting in abnormal cells (Desai and Shet, Citation2018). These abnormal cells look and function differently from traditional blood cells. Two types of abnormal WBCs can cause leukemia: lymphoid cells and myeloid cells. Leukemia caused by lymphoid cells is called lymphocytic or lymphocytic leukemia, and if it is caused by myeloid cells, it is called myelogenous or chronic myelocytic leukemia. Leukemia can be grouped into two categories: acute (or chronic) and lymphoid or myeloid. Diagnosis of leukemia is typically done by a trained expert observing abnormal cells in microscopic images, but this process can be difficult due to the variety of features and unclear images (Desai and Shet, Citation2018; Nimesh and Ashutosh, Citation2015). Image enhancement and machine learning techniques can be used to improve the visual quality of medical images and aid in the localization of pixels, resulting in a faster and more accurate diagnosis (Nimesh and Ashutosh, Citation2015).

Digital image processing (DIP) is commonly used in cancer detection, such as classifying brain tumors, detecting lung cancer cells (Al-Tarawneh, Citation2012), and detecting breast cancer in mammographic images (CharanKhan and Khurshid, Citation2018). Some research has also focused on enhancing blood microscopic images and detecting leukemia automatically.

A systematic review done by Raina et al. (Citation2023) investigates deep learning methods for the classification and detection of acute leukemia. It emphasizes the importance of automatic detection to produce robust and accurate results. The study covers preprocessing, augmentation, segmentation, feature extraction, and challenges faced by authors in different datasets. Most of the research in this area uses the F1-score and accuracy as performance metrics.

Jagadev and Virani (Citation2017) research aims to detect leukemia and determine whether it is acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), or chronic myeloid leukemia (CML). They found that the morphological components of normal and leukemic lymphocytes differ significantly; hence, various features have been extracted from the segmented lymphocyte images. Leukemia is further classified into its types and subtypes by making use of the SVM classifier, which is a machine learning classifier.

In the Ghaderzadeh et al. (Citation2021) review, leukemia, peripheral blood smear (PBS) images, detection, diagnosis, and classification were the keywords used as a systematic search strategy, which was carried out using Google Scholar and four databases (PubMed, Scopus, Web of Science, and ScienceDirect). At first, 116 articles were located, following the application of the inclusion and exclusion criteria. The research population consisted of 16 publications. This research paper provides an extensive and methodical overview of the current state of all available machine learning (ML)-based PBS image processing leukemia detection and classification models. The average accuracy of the ML techniques used in PBS image analysis to identify leukemia was above 97%, suggesting that the application of ML might produce remarkable results in the identification of leukemia from PBS images.

When comparing deep learning (DL) to its predecessors, DL outperformed all other machine learning (ML) techniques in terms of sensitivity and precision in identifying various leukemia instances. While machine learning (ML) has several uses in the analysis of various leukemia images, the use of ML algorithms to identify acute lymphoblastic leukemia (ALL) has garnered the most interest in the hematology and artificial intelligence domains (Ghaderzadeh et al., Citation2021).

An example of a normal blood microscopic image is shown in (a) with a higher number of WBCs than RBCs and normal cell shape, while (b) shows a blood microscopic image with leukemia with abnormal numbers and shapes of WBCs and RBCs. While it may be easy for physicians to detect leukemia in some cases, in others, the difference between normal and abnormal cells may not be as clear, and misclassification of the disease may occur (Gupta et al., Citation2018).

Figure 1. Examples of microscopic images: (a) normal blood microscopic image, (b) blood microscopic image with leukemia (Gupta et al., Citation2018, Citation2020).

Figure 1. Examples of microscopic images: (a) normal blood microscopic image, (b) blood microscopic image with leukemia (Gupta et al., Citation2018, Citation2020).

In this research, unique features and image processing techniques are used to detect leukemia in microscopic images automatically, in a faster and more accurate way than the regular detection methods used in hospitals. Image processing techniques are applied in order to prepare the images for the feature extraction step. The features are based on the number and area of the WBCs that are found in the microscopic images.

2. Literature review

In the research by P. Jagadev and H. G. Virani, a dataset of 220 blood smear images from leukemic and normal patients was analyzed using three image segmentation algorithms: K-means clustering, marker-controlled watershed, and HSV color-based segmentation. Various features were extracted from the segmented lymphocyte images, which were used to focus on the differences in the morphological components of normal and leukemic lymphocytes. These features were then used in a SVM classification method to classify the leukemia into different types (Jagadev and Virani, Citation2017).

Patel and his colleagues also employed a K-mean clustering approach for blood cancer detection. The preprocessing techniques used were histogram equalization and the Zack algorithm. features such as mean, standard deviation, color, area, and perimeter were extracted and used in the SVM classification process. The system was tested on an image dataset and achieved 93.57% accuracy (Nimesh and Ashutosh, Citation2015). In the research by Yadav et al. (Citation2018), a specific process was used that involved partitioning an image into smaller parts to ensure proper scanning, and then counting the number of white blood cells (WBCs) in the processed image. In addition to segmentation, the work also involved separating WBCs from red blood cells (RBCs) and platelets. A feature extraction process was used to qualitatively assess the images and differentiate between cancerous and non-cancerous data with good accuracy.

In 2019, Mohammad and colleagues developed an automated system for detecting blood cancer using image processing techniques. The system included image enhancement and contrast stretching as the first step, followed by the extraction of features using thresholding and region analysis methods. This included calculating the length of the infected parts. The system was tested using multiple images and the cells were classified as cancerous or non-cancerous. The classification accuracy was determined based on the testing outcomes and by measuring the shapes and sizes of the cancer cells and it was found to be 91%. Other methods such as removing noise from the image and performing thresholding, background extraction, and image segmentation using Sobel edge detection and hole filling to identify areas were also presented (Assistant Professor, ECE Dept., Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur Dist., A.P., India, 2019).

Promising results were established by Raje and her colleague’s paper. They used the image segmentation technique based on statistical parameters such as mean and standard deviation. Then, these parameters were used to separate the WBCs from other components in the blood image. Area and perimeter are the geometrical features that were applied to the WBC nucleus and is used to build a diagnostic prediction for leukemia system. Their method was successfully applied to a large number of images with very good accuracies for different qualities of the images (Raje and Rangole, Citation2014).

Another automated blood cancer detection system was presented by Yadav et al., where cancer infected blood cell images were collected and further processing were done like filtering, image enhancement, and histogram equalization. Then, the segmentation process is done using K-mean clustering which is followed by feature extraction, where features of the nucleus are extracted using GLCM and GLDM. Finally, the support vector machine (SVM) classifier with the help of data in the knowledge base was used for classification and a decision was made based on whether the new cell is cancerous or not (Yadav et al. Citation2018).

The research presented in the paper includes two main features for detecting leukemia in blood: counting the number of WBCs and calculating their areas. The entire process is implemented in MATLAB software. The primary objective of this paper is to create an automated system for detecting leukemia in blood microscopic images and classifying new images as normal or abnormal (leukemia) using novel features that are not used in the literature. This method aims to be faster, easier, and less time-consuming compared to the regular diagnosis methods commonly used in hospitals.

3. Methodologies

3.1. Dataset

The dataset used in this project is the combination of “Normal” and “Abnormal” microscopic images. The number of images is 85 samples for each category (with a total of 170 images). The types of images are RGB images with (.jpg) and (.bmp) extensions and are related to Gupta et al. papers. For the microscopic medical images, Gupta et al.'s research suggests the geometry-inspired chemical-invariant and tissue-invariant stain normalization approach, or GCTI-SN. By utilizing the geometry of the underlying color vector space, the suggested GCTI-SN technique accounts for variations in light, stain chemical composition, and stain quantity within a single, cohesive framework (Gupta et al., Citation2018, Citation2020).

3.2. Method description

The block diagram shown in describes the procedure that is used in this work. It begins with the dataset and image acquisition, then the preprocessing steps where the noise is removed and the images are prepared for the following step. The third step is feature extraction where two main features are extracted from each image in the dataset using two techniques: “cell counting” where the number of WBCs and RBCs is counted, and “cell area” where the area of the WBCs is found in the microscopic images. The last step is the classification step, where the images in the dataset are classified into two classes depends on the features and a specific threshold level. An illustration of the feature matrix (A) that is obtained from each image in the dataset provided in , along with the label vector (L) that returns 1 for normal images and 0 for abnormal images. Also, in the feature matrix, FNM denoted by the N feature that is extracted from the M images.

Figure 2. Block diagram of the whole process.

Figure 2. Block diagram of the whole process.

Figure 3. Example of the feature matrix and the label vector used in this work.

Figure 3. Example of the feature matrix and the label vector used in this work.

3.3. Image preprocessing

In this stage, each blood microscopic image is processed based on the color ratio as it is a RGB image type. RGB images, also known as true color images, are stored as m-by-n-by-3 data array that defines red, green, and blue color components for each individual pixel. The color in each pixel is determined by a combination of the red, green, and blue intensities stored in each color plane at the pixel’s location (Gonzalez and Wintz, Citation1977). In this work, the microscopic image is separated into its basic color planes (red, green, and blue) as seen in . Based on the colors of the white and red cells that exist in the RGB microscopic image, the image is separated into two images: image 1 and image 2. In image 1, the blue plane (Bplane) is extracted using EquationEq. (1), which extracts the blue objects in the image by removing 0.6 from the red and green planes. (1) Bplane=RGB ‐ 0.6 *Rplane0.6 *Gplane.(1)

Figure 4. Original microscopic image and the red, green, and blue planes: (a) original microscopic image. (b) Red plane from the original image, (c) green plane from the original image, (d) blue plane from the original image.

Figure 4. Original microscopic image and the red, green, and blue planes: (a) original microscopic image. (b) Red plane from the original image, (c) green plane from the original image, (d) blue plane from the original image.

In the same way, in image 2 the red objects are extracted using EquationEq. (2) where 0.6 of the blue and green planes are removed from the RGB microscopic image. (2) Rplane=RGB  0.6 *Bplane0.6 *Gplane.(2)

In both equations, Rplane denoted to the red plane, Gplane is the green plane, and Bplane is the blue plane from the original microscopic image (RGB). shows an example of one microscopic image and the red, green, and blue planes extracted from it.

The percentage of the removed colors are sets by trial and error, as the WBCs seen in the microscopic images as purple color, while the RBCs are shown in light pink color. So, after extracting the WBCs and the RBCs, the resultant images are converted into binary images and the noise is removed.

Binary image pixels have only two possible intensity values, and they are normally displayed in black and white. Binary images are often produced by thresholding a grayscale or color image in order to separate an object in the image from the background. The color of the object (usually white) is referred as the foreground color, and the rest (usually black) is referred as the background. However, depending on the image that is to be thresholded, this polarity might be inverted; in this case, the object is displayed with 0 and the background has a non-zero value.

shows one example image after converting into binary, where the top image is after extracting the WBCs and the bottom image is after extracting the RBCs. Many trials were done to finally select the best threshold level that was suitable for the project dataset images.

Figure 5. Image after converting to binary, the right top image shows the WBCs, and the bottom image shows the RBCs.

Figure 5. Image after converting to binary, the right top image shows the WBCs, and the bottom image shows the RBCs.

In the last step of the preprocessing part, the noise is reduced using a filter that removes small objects from the binary image. It removes any connected components (objects) that have fewer than a specific number of pixels from the binary image. This operation is known as an “area opening” process, and the threshold level is chosen to be 100. shows the block diagram of extracting the WBCs by first extract part of the red and green colors from the blue plane using the equation “WBC_color = B – 0.5*R – 0.5*G”, removing the noise, and then counting the WBCs. Also, the RBCs are counted in the same manner, except that the RBC color is extracted at the first step using the equation “RBC_color = R – 0.5*B – 0.5*G”. where R, G, B in the equations refer to the red, green, and blue planes from the original RGB image.

Figure 6. Steps of extracting the WBCs.

Figure 6. Steps of extracting the WBCs.

3.4. Feature extraction

There are many methods used in literature to extract features from blood cells, this work proposes two different techniques. The first is cell counting feature; which is determined by calculating the percentage of blasts cells (WBCs) in the bone marrow with respect to the RBCs in the same image. A diagnosis of ALL generally requires that at least 20% of the cells in the bone marrow are blasts. Under normal circumstances, blasts don’t make up more than 5% of bone marrow cells. Then, the number of the WBCs are counted with respect to the RBCs in the same microscopic image using EquationEq. (3). (3) WBCs % =(number of WBCsnumber of all cells)×100%.(3)

The threshold level is set to 20, and if the resultant number of WBCs is equal to or greater than 20, then, the image is classified as “Leukemia,” also if the number is less than 20, the image is classified as “Normal.”

The second feature is the cell area feature, it is given in EquationEq. (4), which involves drawing a bounding box around each white blood cell (WBC) found in the image and calculating the area of the cell as a ratio of the area of the bounding box around it. As previously mentioned, the bounding box is a box drawn around each cell in the image. In microscopic images, normal cells typically occupy a larger portion of the bounding box area, while abnormal cells are typically smaller. This means that normal cells are expected to have large area values, while abnormal cells will have smaller area values. (4) Area% =(Cell areaBounding box area)×100%.(4)

4. Results and discussion

In this section, the result of classifying blood microscopic images using the extracted features is presented. Also, two examples are presented using the graphical user interface (GUI) which is a pictorial interface that allows the user to interact with an application without needing to understand the details of the underlying programming language. GUIs provide intuitive controls, such as buttons, sliders, and input fields, which users can click or manipulate to perform tasks. In MATLAB, GUIs (also known as apps) enable point-and-click control of software applications, eliminating the need for users to learn complex commands or programming syntax. In this work, the designed GUI example is presented in and which let physicians to choose the image that he/she wants to classify into normal or abnormal image by pressing on the “Load image” button. Then, the GUI will show the resultant images after each processing step, it shows the output parameters like: Number of WBCs, number of RBCs, WBCs and RBCs percentages, and others as it seen in box at the bottom of the GUI in and . Finally a classification result is presented in the “Result” box and it says “Normal” or “Abnormal”.

Figure 7. GUI shows an example of classifying the input image into “normal.”

Figure 7. GUI shows an example of classifying the input image into “normal.”

Figure 8. GUI shows an example of classifying the input image into “leukemia.”

Figure 8. GUI shows an example of classifying the input image into “leukemia.”

4.1. Number of WBCs

In this method, the number of WBC is calculated with respect to the number of RBC in each microscopic image. And, based on the resultant number the image is classified as 'Normal’ or “leukemia.” If the resultant number is equal to or greater than 20, then, the image is classified as “leukemia,” and if the number is less than 20, the image is classified as “Normal.” shows the GUI that enter one image and classifying it into “Normal” as the number of WBCs with respect to RBCs is found to be 2.3 which is less than 20. shows the GUI applies using another image and classifying it into “leukemia” as the number of WBCs with respect to RBCs is found to be 35.7. In both examples, the resultant images after each processing step are presented in order to show how the whole procedure is working.

4.2. Area of the WBCs

Here, the area of the white blood cells (WBCs) is measured and divided by the area of the bounding box drawn around each WBC in the image. If the area of the WBC covers more than 80% of the bounding box area, the WBC is classified as a normal cell. The average area of the WBCs in the same image is calculated and plotted in , which illustrates the average area of the WBCs in each image in the dataset, which includes 170 images—85 of which are normal, and the rest are images with leukemia. As seen in the histogram figure, the average area of the WBC in the normal case is mostly above 0.6. Therefore, the threshold level for the average area is set to 0.6 for any further classification processes.

Figure 9. Average area of WBCs in each image in the dataset.

Figure 9. Average area of WBCs in each image in the dataset.

4.3. Classification process

In this section, the features extraction process that was presented in the previous sections (cell count and cell area) is applied on the whole dataset that includes 85 normal images and 85 leukemia images. Then classification loop function is used based on the calculated features to classify each image as normal or abnormal (leukemia) images. After the loop is completed, a confusion matrix is created to present the results of the classification process and the Accuracy (AC) is calculated using EquationEq. (4). The accuracy (AC) is defined as the probability that the classification by the system is correct, and it is given by EquationEq. (5), which is based on the confusion matrix shown in (Al-Ghraibah et al., Citation2020; Altayeb and Al-Ghraibah, Citation2022). Where TP is the true positive rate, FP is the false positive rate, while FN and TN are the false and true negative rate. (5) AC=TP+TN(TP+FP+TN+FN).(5)

Table 1. Confusion matrix.

The true positive rate (TPR) which is the sensitivity or called recall and the true negative rate (TNR) called the specificity are considered from the confusion matrix too using EquationEqs. (6), and Equation(7), respectively (Al-Ghraibah et al., Citation2023): (6) TPR=TPTP+FN,(6) (7) TNR=TNTN+FP,(7)

The positive predictive value (PPV) or precision is given by EquationEq. (8) and the negative predicative value (NPV) is given by EquationEq. (9) (Al-Ghraibah et al., Citation2023). (8) PPV=TPTP+FP,(8) (9) NPV=TNTN+FN,(9)

shows the confusion matrix for classifying the images in the dataset into normal or leukemia images using the first technique, cell count. The accuracy (AC) is 92%. 75 of the normal images were correctly classified as “Normal,” while 10 were incorrectly classified. 81 of the “leukemia” samples were correctly classified as “leukemia,” and four were incorrectly classified as normal images. shows the confusion matrix for classifying the images in the dataset using the second technique, cell areas. 70 normal images were correctly classified, while 15 were incorrectly classified as leukemia. Most of the leukemic images were correctly classified, with only five of them being misclassified. The calculated accuracy (AC) in this case is 88%. A comparison of the two proposed techniques in this research found that the first method (cell count) could detect leukemia in microscopic images better than the cell area method. However, both techniques are still faster and easier than other methods found in literature. As the time taken to process one image and classify it as normal or abnormal is less than a second in most of the time. And the time taken to process a group of images and classify them together is about 3–4 s only.

Table 2. Confusion matrix of the classification using cell count.

Table 3. Confusion matrix of classifying the images using cell area.

5. Conclusions

This paper presents a simple, fast, and accurate leukemic detection system based on digital image processing applications. In addition, graphical user interface (GUI) is used to present the process and the classification result in a clear and familiar way. The extracted features are “Cell Count” the “Cell Area.” Great accuracies were achieved, 91.76% and 88.23% using the first and second techniques, respectively. Other features could be extracted in the future, and all of them can be used in machine learning classification processes like Support Vector Machine (SVM) or deep learning methods like neural networks (NNs) to build a model and classify huge number of blood microscopic images into normal or images with leukemia automatically.

Acknowledgement

We would like to thank Mustafa Malullah Najm and Khetam Sameer Jalal for their help in this research.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Al-Ghraibah, A., Al-Ayyad, M., & Elkhalil, H. (2020). Tibia Fracture Detection using a Modified Edge Detection Method based on Bone Length. Journal of Engineering Science and Technology, 15(1), 249–260.
  • Al-Ghraibah, A., Altayeb, M., & Alnaimat, F. A. (2023). An automated system to distinguish between corona and viral pneumonia chest diseases based on image processing techniques. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 1–13. https://doi.org/10.1080/21681163.2023.2261575
  • Al-Tarawneh, M. S. (2012). Lung cancer detection using image processing techniques. Leonardo Electronic Journal of Practices and Technologies, 11(21), 147–158.
  • Altayeb, M., & Al-Ghraibah, A. (2022). Classification of three pathological voices based on specific features groups using support vector machine. International Journal of Electrical & Computer Engineering, 12(1), 2088–8708.
  • CharanKhan, M. J. S., & Khurshid, K. (2018). Breast cancer detection in mammograms using convolutional neural network. In Proceedings of the iCoMET, IEEE (pp. 1–5).
  • Desai, P. G. F., & Shet, G. (2018). Detection of leukemia using image processing. International Journal of Advance Research in Science and Engineering, 7(3), 149–156.
  • Ghaderzadeh, M., Asadi, F., Hosseini, A., Bashash, D., Abolghasemi, H., & Roshanpour, A. (2021). Machine learning in detection and classification of leukemia using smear blood images: a systematic review. Scientific Programming, 2021, 1–14. https://doi.org/10.1155/2021/9933481
  • Gonzalez, R. C., & Wintz, P. (1977). Digital image processing. In Applied Mathematics and Computation (Vol. 13, pp. 451). Addison-Wesley Publishing Co., Inc.
  • Gupta, A., Mallick, P., Sharma, O., Gupta, R., & Duggal, R. (2018). PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma. PloS One, 13(12), e0207908. https://doi.org/10.1371/journal.pone.0207908
  • Gupta, A., Rahul, D., Shiv, G., Ritu, G., Anvit, M., Lalit, K., Nisarg, T., & Devprakash, S. (2020). GCTI-SN: Geometry-inspired chemical and tissue invariant stain normalization of microscopic medical images. Medical Image Analysis, 65, 101788. https://doi.org/10.1016/j.media.2020.101788
  • Jagadev, P., & Virani, H. G. (2017). Detection of leukemia and its types using image processing and machine learning. In International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, India (pp. 522–526). https://doi.org/10.1109/ICOEI.2017.8300983
  • Jagadev, P., & Virani, H. G. (2017). Detection of leukemia and its types using image processing and machine learning. In International Conference on Trends in Electronics and Informatics (pp. 522–526). https://doi.org/10.1109/ICOEI.2017.8300983
  • Mohammad, A. B., Sainath, P., Babu, K. C., & Mouli, N. C. (2019). Detection of leukemia using image processing. International Journal of Innovative Technology and Exploring Engineering, 9(2), 2914–2918. https://doi.org/10.35940/ijitee.B7540.129219
  • Nimesh, P., & Ashutosh, M. (2015). Automated leukaemia detection using microscopic images. Procedia Computer Science, 58, 635–642.
  • Raina, R., Gondhi, N. K., Singh, D., Kaur, M., Lee, H.-N., Chaahat  . “A systematic review on acute leukemia detection using deep learning techniques.” Archives of Computational Methods in Engineering, vol. 30, no. 1, pp. 251–270, 2023. https://doi.org/10.1007/s11831-022-09796-7
  • Raje, C., & Rangole, J. (2014). Detection of Leukemia in microscopic images using image processing.” In International Conference on Communication and Signal Processing (pp. 255–259). https://doi.org/10.1109/ICCSP.2014.6949840
  • Yadav, C., Zele, S., Patil, M. T., Bombadi, M. V., & Chaudhari, M. T. (2018). Automatic blood cancer detection using image processing. International Journal of Recent Trends in Engineering & Research (IJRTER), 4(3), 2455–2457.