Deep Learning project : Brain tissues image segmentation
This project present an implementation of Mask R-CNN on dataset containing segmentation of ultra-thin sections of biological tissue in light microscopy imagery. The goal was to accurately segment each of the sections in a large image in order to determine the section coordinates. These coordinates will later be used for automated image acquisition in a high resolution microscope such as electron microscope. This Machine Learning course project at EPFL was done in collaboration with the Center for Interdisciplinary Electron Microscopy (CIME) at EPFL. We applied our knowledge to solve a practical real-world problem that involved the implementation of complex machine learning models for image segmentation and feature detection. Code has been made available at: https://github.com/pelletierkevin/BrainTissue_SegMachineLearning where tissue-parts detection contains the notebooks and augmented data, and JekeelMaskRCNN contains the files used for training the models.
I. Data exploration
Our data for this project were images of silicon wafers, for which the positions of the sections inside were preliminarily collected. The first particularity of our dataset is therefore the fact that we did not have a large number of different images, but that each of them contains a lot of different features. We started the project with 3 different wafers fully labelled.
Then, a second specificity is the fact that, although the images are totally different because the sections are
(more or less) randomly located, the sections are very similar and there is in fact not so much difference from one section to another.
As shown in the figure 1, a section consists of a part containing the brain section, and another which is a magnetic material used to recognize and order each section. Our work will therefore be, in a given image, to locate each of these parts for all the sections found. A section also often contains a “dummy part” (smaller), which is not relevant to extract, and only there to help during the slicing of the sections. It can be attached to the section or not, so its position is very random from
one wafer to another. Finally, in addition to the grayscale images just mentioned, we also used images in which only the magnetic parts are present. We will then see that they will help us to differentiate the magnetic parts from those containing
a brain section.
II. Data preprocessing
A. Data preparation
So we have just seen that at the input of our system, we will send very large images that all contain a lot of features.
However, we will see later on that we will use complex deep learning models that work better on small images.
Thus, we will work from now only on smaller patches of our images. The first pre-processing step is therefore to divide a wafer image into multiple smaller images, each containing about 4 or 5 sections. More precisely, we will create images of size 512×512 to train on.
Our initial dataset was containing the gray scale images of wafers and corresponding fluorescent images that are highlighting magnetic resin part. This information was used to create three channel images (RGB images) by combining the grayscale image in two channels and the fluorescent image in a third channel. We will therefore work with RGB images, in which we believe colour would make it possible to differentiate the magnetic parts from the brain tissue during the training.
B. Artificial data generation
After carefully analyzing our data in the previous section, we derive from this work a key point: The amount of data is maybe not large enough to optimally train complex deep learning models. However, the characteristics of our images, and their similarity, will allow us to solve this problem. Indeed, in view of the images we were provided to address the problem, we can quickly notice that the recognition of the sections is relatively simple, since our data is only made up of these sections, and of the background. So we implemented a script that allowed us to artificially generate new images on which to train our
The first part of this work was to extract a large number of sections from the images. With these, we were then able
to project them randomly onto a predefined background, resulting in an image similar to the one that would be
provided to our final model. We had to make sure that two sections did not overlap. We then implemented a script to generate these data, allowing the choice of the original wafer base image, the number of image produced and the size. These data were particularly useful as the original wafers are highly crowded with sections, and do not show a lot of background. Our generated images helped for the distinction between sections and background.
C. Data augmentation
The idea behind the data augmentation can be seen on the following figure. Now that we have succeeded in generating
and obtaining a very large dataset, we will try to improve it a little in this part. Indeed, to train an efficient neural network, we need to think of images taken under all sorts of conditions. Making our model able to cope with all these changes that may occur means making it more robust.
What we are trying to do here is in fact to teach our neural network about something called invariance, which is the ability to recognize the feature, regardless of the conditions that it is presented in. This process consists of using an image in the dataset, on which the model will theoretically have little difficulty, and applying all these modifications artificially to it, creating as many new images as changes applied. But we must be careful when applying all these changes to our images. Indeed, we need to make sure that the data is still relevant to our specific ideal model. For example, the shape of the sections we are trying to detect will always be a polygon. Applying a distortion to an image that would modify our sections shape would therefore be a mistake, because we would teach the network incorrect information.
Then, when the images are loaded to the model, we apply several augmentation techniques using random crops, random rotations, gaussian blurring, and random horizontal and vertical flips.
III. Models & Methods
A. Baseline model Mask-RCNN
The first challenge of our model will be to detect a section in the image, and thus successfully determine a region of interest in which the section is located. This is called detection. Then, once our model has located a part of the image in which a section is located, it will be necessary to extract precisely the contours to obtain its orientation for example. This is segmentation.
The implementation used is based on existing implementation of Mask-RCNN by Matterport Inc. which is itself based on the open-source libraries Keras and Tensorflow. This implementation is well documented and easy to extend for our purposes. For additional information, we refer readers to our GitHub repository notebooks. Mask-RCNN relies on region proposals which are generated via a region proposal network. It follows the Faster-RCNN model of a feature extractor followed by this region proposal network, followed by an operation known as ROI-Pooling to produce standard-sized outputs suitable for input to a classifier. Mask and class predictions are divided and independent. Masks are covering the parts of the sections (whether brain or magnet) and class predicts whether it is brain or magnet. The mask network predicts the mask independently from the network predicting the class. The Mask-RCNN allows very accurate instance segmentation masks as well as it adds a small fully convolutional neural network to produce the image segmentation. This entails the use of a multitask loss function :
L = Lcls + Lbbox + Lmasks. (respectively loss of class, loss of bounding boxes and loss of masks).
Mask-RCNN is built on a backbone convolutional neural network architecture for feature extraction which can use a feature pyramid network (FPN) such as ResNet-50 in its backbone to obtain great performances in both accuracy and speed.
Github link to Mask_RCNN repository : https://github.com/matterport/Mask_RCNN
B. Transfer learning
Indeed to optimize the detection process we used the Resnet50 model as a backbone to train our model. The transfer learning is particularly useful to gain accuracy with these dedicated models. Additionally, once we trained using the original dataset and the artificial images, we used the newly generated weights as a pretrained model to train only the head layers on the new wafer images provided. The new wafers are not provided with ground truth labels, we then implemented the tool to generate artificial images based on a few manually labellized sections. This allows the model to discover new shape of sections and to train a bit further.
C. Model Implementation
Using our data generation scripts, we created and organized 900 images for the training set, 90 for the validation set and 50 for the test set, all associated with their associated masks, with size of 512 or 1024 depending on the section sizes. We configured our specific model based on the Nucleus sample. Then we set the number of validation steps per epoch to be more than 100, relating to our validation set size. We modified the anchors size to go from 16 to 256, as we sometime have big section, but we kept small anchors to be able to detect the cropped parts in the edges. As we have highly some images containing a lot of sections, we set the parameters linked to the number of maximal detections or ground truth to a high
value (more than 300). Furthermore, we implemented the model by setting the resizing mode as cropping with a size of 512, which is similar to our dataset. Finally we trained all the layers of the model on 40 epochs as the loss was still decreasing.
As we can see on the figure, the model does not seem to overfit as the variance between the loss and validation loss is low, neither to underfit. However, we can observe a growth in the variance for the epoch 40, probably leading the idea of overtfitting. That is why we kept the model from the epoch 35. We finally computed the mean averaged precision (mAP) and the mean recall based on our separated test set images, comparing the ground truth bounding boxes with the predicted ones. These computations were made by setting the Intersection over Union (IoU) threshold to 0.5.
One result of our training could be seen on the following figure where we implemented our model on an cropped unknown wafer. Depending on the provided wafer, we are able to modify the detection confidence threshold from 0.9 to 0.99. As these images were not included in the training set, it confirms that the model is not overfitting.
The model is sometime producing false positives or is missing some targets on the new wafers. It depends on the shape of the unknown sections. Indeed, if the shape is similar to those from the training set it will perform efficiently, whereas if the shapes are different there could be some detection problems. We then implemented a pipeline, where for some new wafers, we will be able to labelize some sections of them manually in a json file, and to produce artificial data based on them with the associated masks. Additionally, we will generate the 3-channels images from the original and the fluo images, to fit with the usual images used in the model. We will then be able to set up a new small training, focusing only on the heads layers with a few epoch, starting from the previously implemented model. Finally, as the wafer images are usually big, the model can not apply directly the detection which is causing memory issues. We then implement a script to crop the entire image into several overlapping images, which are then implemented and segmented. Furthermore, we apply a method to filter the bounding boxes on each crop, depending on the position
Again, the Github repositories for this project containing all the python notebooks and files : https://github.com/pelletierkevin/BrainTissue_SegMachineLearning
You can also find the report named
Brain_segmentation.pdf in the github repository.
If you want to find some other similar projects : https://codethekey.com/tag/project/