“SIIM-ACR Medical Chest X-Ray Segmentation by Deep learning”
Is it possible for AI to identify collapsed lungs (Pneumothorax) from a chest X-ray?
Artificial intelligence has spread over a wide range of industries; believe it or not, every app on your phone uses AI to some degree, and there are many medical treatments that use AI for disease diagnosis. In fact, in the healthcare industry, digital imaging is a very common way to diagnose major diseases, and Artificial Intelligence is now greatly assisting in such diagnosis by analyzing digital imaging of X-rays, CT-Scans, and other forms of imaging. In this blog, I’ll discuss my work on a case study called “SIIM-ACR Pneumothorax Segmentation,” which involves identifying lung disease using chest X-rays.
So let’s just look at the outline of the blog,
- Pre-requisite Knowledge
- Business Problem
- Mapping Business Problems into Deep Learning Problems
- Existing Approaches
- My First cut approach
- Exploratory Data Analysis
- Preparing data for model
- Data Preprocessing
- Preparation of Deep Learning models
- Summary
- Deployment
- Future work
- Code Repository
- References
Pre-requisite Knowledge
What is Pneumothorax?
Imagine suddenly gasping for air, helplessly breathless for no apparent reason. Could it be a collapsed lung?
Image Credit: https://www.firstaidforfree.com/what-is-a-spontaneous-pneumothorax/
Pneumothorax can be caused by a blunt chest injury, damage from underlying lung disease, or most horrifying — it may occur for no obvious reason at all. On some occasions, a collapsed lung can be a life-threatening event.
What Are the Symptoms of a Collapsed Lung?
Sharp, stabbing chest pain that worsens with coughing or deep inhalation and sometimes radiates to the shoulder and or back; and a dry, hacking cough are all symptoms of a collapsed lung. An individual can go into shock in extreme cases, which is a life-threatening condition that needs urgent medical attention. If you have some form of chest pain or suspect a Pneumothorax, see a doctor.
How Is a Collapsed Lung Treated?
If there is no underlying lung disease, a minor pneumothorax can heal on its own in one to two weeks. A broad pneumothorax, on the other hand, is typically handled by extracting air under pressure through the chest cavity with a needle attached to a syringe. It is possible to use a chest tube and leave it in place for many days. Surgery may be needed in some cases.
Can affect Pneumothorax to everyone?
- Pneumothoraxes can happen to people who are very tall and thin. The lung collapses in this state after little to no trauma.
- Smokers and men between the ages of 20 and 40 are more likely to experience spontaneous pneumothorax. Smoking has been linked to an increased risk of developing a sudden pneumothorax.
What is a DICOM(.dcm) image?
Digital Imaging and Communications in Medicine (DICOM) is the standard for the communication and management of medical imaging information and related data. DICOM is most commonly used for storing and transmitting medical images enabling the integration of medical imaging devices such as scanners, servers, workstations, printers, and network hardware.
What does the DICOM format contain?
DICOM contains image data along with other important information about the patent demographics, and study parameters such as patient name, patient id, patient age, and patient weight. For confidentiality purposes, all information that identifies the patient is removed before transmitting it for educational or research purposes.
What is Run Length Encoding(RLE)?
Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. Please refer to this video for an in-depth explanation of the same.
What are AP and PA X-ray views?
AP, X-ray: An X-ray picture in which the beams pass from front to back (anteroposterior).
In a posteroanterior (PA) view, the x-ray source is positioned so that the x-ray beam enters through the posterior (back) aspect of the chest and exits out of the anterior (front) aspect, where the beam is detected. To obtain this view, the patient stands facing a flat surface behind which is an x-ray detector.
2. Business problem
Pneumothorax is usually diagnosed by a radiologist on a chest x-ray, and can sometimes be very difficult to confirm. An accurate AI algorithm to detect pneumothorax would be useful in a lot of clinical scenarios. AI could be used to triage chest radiographs for priority interpretation or to provide a more confident diagnosis for non-radiologists.
2.1 Objective of the Problem
We are expected to two separate models which :
- Classifies whether the patient has Pneumothorax or not
- After classifying the x-ray images we must segment/detect the location which is affected.
3. Mapping the Business Problem to a Deep Learning Problem
Going through each of the X-ray images and narrowing down the location of the affected area is time-consuming and a tedious process. Small miscalculations or oversights might cost the patient his/her life. Through the model trained if we can simplify the tasks for the doctors, we can reduce the rate of error and probably save many lives.
3.1 Understand the data
Originally the creators of the contest had the idea of downloading the data from Google Cloud Healthcare API, to learn a bit about accessing real-world data. Thanks to Jesper Sören Dramsch, the data is easily accessible here.
The main folder contains two subfolders: train and test, and a CSV file with the train labels. The train and test folders contain DICOM (Digital Imaging and Communications in Medicine) formatted files, for some reason, every single image is embedded in two more folders. the CSV content seems complicated and the .dcm file can not be opened easily.
The data is comprised of images in DICOM format and annotations in the form of image IDs and run-length-encoded (RLE) masks. Some of the images contain instances of pneumothorax (collapsed lung), which are indicated by encoded binary masks in the annotations. Some training images have multiple annotations.
There are 10712 train and 1377 test files the CSV with the labels(EncodedPixels): -1 means no Pneumothorax, otherwise, there is an encoding for the mask of the region affected by Pneumothorax
Let us explore the given data/images to get a better understanding of the problem we are solving.
3.2 Type of Deep Learning Problem:
As we have seen above in the dataset section we have a dataset in the form of images, and our task is to predict the mask of pneumothorax in the X-ray image. This problem is of Semantic Image Segmentation problem. This model will assist a physician to diagnose Pneumothorax.
For solving this we have to use Deep Learning techniques which are used for non-structural data like Audio and video files, and images. In this particular case, we have data in the form of X-ray images which is one the unstructured data.
I will give a little introduction to Image Segmentation,
Basically, Image segmentation is a task where we classify pixel values of images belonging to a particular object class. So based on the way of classifying these pixels, there are broadly two types of Segmentation, Semantic segmentation, and Instance segmentation. Consider the below images:
- Semantic segmentation :
In this technique, all the pixels of a similar type are segmented with the same color as we can see in the above image. It detects that there are persons in pink shade and backgrounds in black color.
- Instance segmentation:
This technique segments all the similar objects or pixels in a different color, we can see that in the above image, each person is represented by a different color.
As discussed earlier our problem falls under the Semantic Segmentation category where we have to label a pixel either a mask or a non-mask(background).
3.3 Metrics for the Segmentation task:
Dice Coefficient:
The dice coefficient originates from Sørensen–Dice coefficient, which is a statistic developed in the 1940's to gauge the similarity between two samples. It was brought to the computer vision community by Milletari et al. in 2016 for 3D medical image segmentation. Dice Loss is also known as the F1 score metric. In a simple manner, the Dice coefficient is 2 * the Area of Overlap divided by the total number of pixels in both images. Dice loss ranges from 0 to 1, with 1 signifying the greatest similarity between predicted and truth. It is essentially a measure of overlap between two samples. This measure ranges from 0 to 1 where a Dice coefficient of 1 denotes perfect and complete overlap. The Dice coefficient was originally developed for binary data and can be calculated as:
4. Existing Approaches
There have been quite a few interesting approaches. Let’s look into a few of these solutions
- 3rd place solution by Best fitting (https://github.com/bestfitting/kaggle/tree/master/siim_acr)
As the image sizes were a little too big to fit in the RAM, to save memory and computation, Best fitting first trained a UNET model to predict lungs from 1024x1024 images, all his models were based on the cropped lungs, 576x576 cropped images were good enough for his models.
After many experiments for segmentation, he has chosen the UNET based on resnet34 and SE-resnext50 backbones. He has used Lovasz loss along with adam optimizer.
2. 4th place solution by Miras Amir
( https://github.com/amirassov/kaggle-pneumothorax3)
Miras Amir’s solution is based on UNet with a deep supervision branch for empty mask classification.
Model: UNet.
Backbone: ResNet34 backbone with frozen batch-normalization.
Preprocessing: training on random crops with (512, 512) size, inference on (768, 768) size.
Augmentations:ShiftScaleRotate,RandomBrightnessContrast,ElasticTransform, HorizontalFlip from albumentations.
Optimizer: Adam, batch_size=8
Scheduler: CosineAnnealingLR
Additional feature: the proportion of non-empty samples linearly decreased from 0.8 to 0.22 (as in the training dataset) depending on the epoch. It helped to converge faster.
Loss: 2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty).
Here pred_mask is the prediction of the UNet, pred_empty is the prediction of the branch for empty mask classification.
Postprocessing: if pred_empty > 0.4 or area(pred_mask)
5. My First Cut Approach
- Data Collection:
A sufficient amount of data is needed to solve any problem using Machine Learning or Deep Learning algorithms. We may have difficulties obtaining data from different sources at times, but it is not always a challenging task. Thankfully, the competition organizer has already given the data; all we have to do now is download it and get to work.
2. Preprocessing:
I’ll have to remove the images from the.DCM file, which is a DICOM image format, and prepares the data for training the Deep Learning algorithm.
3. EDA (Exploratory Data Analysis):
Because we already have the image data, we won’t be able to do as much EDA as we will in a dilemma with a lot of features to compare. However, we do have Dicom files with metadata about the videos, which is a plus. Although this metadata may not be completely useful when training a model, it may provide some insight into the details.
4. Model Development:
Now I’ll be able to draw some conclusions from the data and masks produced during the preprocessing phase. In this case, I’ll solve the problem using Deep Learning. This is an image segmentation challenge, and there are a variety of deep learning algorithms for doing so.
6. Exploratory Data Analysis
Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
Let’s start with reading the data and exploratory data analysis,
The number of images in the train and test:
It can be seen that the training set is not balanced: most images have no pneumothorax.
Gender and Pneumothorax distributions:
There are slightly more men with pneumothorax than women. We also know the age of the patients, so let’s look at the age distribution. To create a histogram, you need to know the maximum age, e.g. by sorting. in the second pie chart, we found healthy percentage is more than illness in men and women that is around 77.4% healthy and 22.6% illness.
Age-wise Pneumothorax distribution:
From this histogram, we can easily count the age of patients on the basis of their count. for the age of 60, there are more number counts, and patients age between 47 to 66 having more than 200 counts.
To begin with, the average age range seems to be almost typical, but not yet. It seems that Pneumothorax doesn’t really depend on one’s age. Pneumothorax has little effect on babies aged 0–6 years old in this dataset. At least one patient is affected, ranging in age from 7 to 85 years. The majority of the patients who have been infected are 51 years old. However, we cannot assume why a certain age demographic is affected because there is so much variation, i.e. patients of almost all ages above 6 years are affected.
Visualizing The Chest X-Ray information:
7. Preparing Data for the model
We shall be using tf.dataset pipeline rather than ImageDatagenerator and similar pipelines.
Why use Tf.Dataset pipeline?
The tf. data API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training.
The Dataset
API allows you to build an asynchronous, highly optimized data pipeline to prevent your GPU from data starvation. It loads data from the disk (images or text), applies optimized transformations, creates batches, and sends it to the GPU. Former data pipelines made the GPU wait for the CPU to load the data, leading to performance issues.
Applying some transformations to our dataset:
The tf.data.Dataset.cache transformation can cache a dataset, either in memory or on local storage. This will save some operations (like file opening and data reading) from being executed during each epoch. The next epochs will reuse the data cached by the cache transformation.
tf.data.Dataset.prefetch overlaps the preprocessing and model execution of a training step. While the model is executing training step s, the input pipeline is reading the data for step s+1. Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.
repeat() method of tf.data.Dataset class is used for repeating the tensors in the dataset.
shuffle () shuffles the train_dataset with a buffer of size 1500 for picking random entries.
batch() will take the first 32 entries, based on the batch size set, and make a batch out of them
8. Data Preprocessing
Data Augmentation
What is data augmentation?
It is a technique to increase the diversity of your training set by applying random (but realistic) transformations. It exposes the model to various scenarios that the model might face in the testing data. Data augmentation is an integral process in deep learning, as in deep learning we need large amounts of data and in some cases, it is not feasible to collect thousands or millions of images, so data augmentation comes to the rescue.
It helps us to increase the size of the dataset and introduce variability in the dataset. For more information regarding data, augmentation clicks here.
It’s not a good idea to use augmentations on every image. Let’s pick a random augmentation technique and add it to a single image for the sake of convenience.
Assigning the data to the functions:
9. Preparation of Deep Learning Models:
The main goal of this study is not to attain high accuracy or to outperform any model; rather, it is to investigate how the model performs and conduct performance analysis in order to comprehend the model’s actions.
9.1 Classification:
What is transfer learning?
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.
It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in a skill that they provide on related problems.
What is the Densenet Model?
A DenseNet is a type of convolutional neural network that utilizes dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.
To load the available weights, we first built the Densenet121 base model (1000 classes):
We used initial weights from a model trained using chest radiograph data to freeze the first few layers of the Densenet121 model. Then, by replacing the final dense layer with a single unit, we generated our own binary classifier using the Densenet121 model.
With Adam optimizer and binary cross-entropy loss, we used accuracy, precision, and recall as metrics.
Once we’ve finished with classification, we’ll step on to segmentation, which is the meat of this case study.
9.2 Image Segmentation:
Image segmentation can be prepared by various algorithms. Let’s begin with the most basic model and see if we can outperform it. We’ll mess around with various iterations of the model. Allow me to send you a quick overview of the Unet model.
What is a UNET model? (Source: Wikipedia)
U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg, Germany. The network is based on the fully convolutional network and its architecture was modified and extended to work with fewer training images and to yield more precise segmentation.
UNET Architecture
The main idea of the UNET model is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. What’s more, a successive convolutional layer can then learn to assemble a precise output based on this information.
One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers. To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images since otherwise the resolution would be limited by the GPU memory.
For more information regarding the UNET architecture click here.
9.2.1.Simple UNET model
Let’s check it out with our images and masks now that we have a basic understanding of Unet’s architecture. For the segmentation task, we can use the same pipeline (tf.Dataset ). We’ll also apply the same augmentation techniques to the images as we did with the classification task.
There are a few packages available online such as https://github.com/qubvel/segmentation_models, which can be installed and be used for initializing and training models such as Unet, FPN, LinkNet, and PSPNet.
Unfortunately, these models have compatibility issues with Tensorflow versions which are greater than 2.0. To avoid such issues it is recommended to implement the model from scratch as the architecture is simple.
The best model’s performance is val_loss: 0.0603, val_accuracy: 0.9848, val_dice_coefficent :0.2509 at the 68th epoch.
This model doesn’t perform up to the mark. We shall try using the UNET architecture but by using a different architecture as the backbone.
9.2.2. Custom Unet model with Densenet121
The best model’s performance is val_loss: 0.0580, val_accuracy: 0.9874, val_dice_coefficent :0.4608 at the 23th epoch.
This model converges faster than the previous UNET model, and it produces strong results as compared to it.
9.2.3. Training Unet and densenet 121 as backbone (with Dropout layers) without data augmentation
Sometimes it is better to use Dropouts as we do not want to overfit the model. Overestimating the model always leads to poor performance on the test data.
Let’s randomly add a few dropout layers in between the architecture and monitor the performance.
The best model’s performance is al_loss: 0.0627, val_accuracy: 0.9856 , val_dice_coef: 0.4096 at the 24th epoch. This model does not converge as fast as the previous model.
9.2.4. Custom Unet(Backbone-Densenet121) model without data augmentation
Till now we were training the model using the augmented images. Let’s see if there is any change in the performance of the model if there isn’t any data augmentation.
It has the same architecture as the previous model. We shall directly feed the images and masks from the training data to the model without augmenting it.
The best model’s performance is val_loss: 0.0527, val_accuracy: 0.9865 , val_dice_coef: 0.4209 at the 31epoch. This model does not converge as fast as the previous model.
9.2.5. Visualizing a few Xray-images and masks from the best model
The best model is the Custom Unet(Backbone-Densenet121)architecture. Some of the predictions are shown below:
The model will narrow down the position of the affected region, despite the fact that the predictions aren’t accurate. For Doctors and other Domain specialists, this would undoubtedly save a lot of time.
10. Summary
The segmentation model which has been retraining the IMAGENET weights using the Densenet architecture has yielded great results on test accuracy is 0.9875.
For the Segmentation model:
11. Deployment
I used the Streamlit platform to deploy the above-mentioned model; here’s a sample video of how the web application operates.
12. Future Work
We can use different methods available for handling data imbalance, better augmentation techniques, etc. One can definitely try other segmentation architectures such as Mask-RCNN, DeepLabV3, etc. Unet++ architecture can be used for solving this deep learning problem.
We can train this model for other medical images and use the model to diagnose other diseases.
The model can be turned into an API that can be used by medical practitioners/experts as an initial screening tool in medical diagnosis.
13. References
- https://www.appliedaicourse.com/
- https://www.kaggle.com/jesperdramsch/intro-chest-xray-dicom-viz-u-nets-full-data
- https://www.kaggle.com/schlerp/getting-to-know-dicom-and-the-data/notebook
- U-Net: https://arxiv.org/abs/1505.04597.pdf
- https://medium.com/@karan_jakhar/100-days-of-code-day-7-84e4918cb72c
- https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2
- https://www.analyticsvidhya.com/blog/2019/04/introduction-image-segmentation-techniques-python/
- https://pydicom.github.io/pydicom/stable/auto_examples/input_output/plot_read_dicom.html
14. Code Repository
All the code presented in this blog can be found in the following GitHub repository.
Thanks for your valuable time in reading the blog! Don’t forget to clap and comment on this blog.