Normalization of HE-stained histological images using cycle consistent generative adversarial networks

Runz, Marlen; Rusche, Daniel; Schmidt, Stefan; Weihrauch, Martin R.; Hesser, Jürgen; Weis, Cleo-Aron

doi:10.1186/s13000-021-01126-y

Research
Open access
Published: 06 August 2021

Normalization of HE-stained histological images using cycle consistent generative adversarial networks

Marlen Runz ORCID: orcid.org/0000-0002-4914-4799^1,2,
Daniel Rusche¹,
Stefan Schmidt³,
Martin R. Weihrauch⁴,
Jürgen Hesser^2,5,6 &
…
Cleo-Aron Weis¹

Diagnostic Pathology volume 16, Article number: 71 (2021) Cite this article

9477 Accesses
29 Citations
3 Altmetric
Metrics details

Abstract

Background

Histological images show strong variance (e.g. illumination, color, staining quality) due to differences in image acquisition, tissue processing, staining, etc. This can impede downstream image analysis such as staining intensity evaluation or classification. Methods to reduce these variances are called image normalization techniques.

Methods

In this paper, we investigate the potential of CycleGAN (cycle consistent Generative Adversarial Network) for color normalization in hematoxylin-eosin stained histological images using daily clinical data with consideration of the variability of internal staining protocol variations. The network consists of a generator network G_B that learns to map an image X from a source domain A to a target domain B, i.e. G_B:X_A→X_B. In addition, a discriminator network D_B is trained to distinguish whether an image from domain B is real or generated. The same process is applied to another generator-discriminator pair (G_A,D_A), for the inverse mapping G_A:X_B→X_A. Cycle consistency ensures that a generated image is close to its original when being mapped backwards (G_A(G_B(X_A))≈X_A and vice versa). We validate the CycleGAN approach on a breast cancer challenge and a follicular thyroid carcinoma data set for various stain variations. We evaluate the quality of the generated images compared to the original images using similarity measures. In addition, we apply stain normalization on pathological lymph node data from our institute and test the gain from normalization on a ResNet classifier pre-trained on the Camelyon16 data set.

Results

Qualitative results of the images generated by our network are compared to original color distributions. Our evaluation indicates that by mapping images to a target domain, the similarity training images from that domain improves up to 96%. We also achieve a high cycle consistency for the generator networks by obtaining similarity indices greater than 0.9. When applying the CycleGAN normalization to HE-stain images from our institute the kappa-value of the ResNet-model that is only trained on Camelyon16 data is increased more than 50%.

Conclusions

CycleGANs have proven to efficiently normalize HE-stained images. The approach compensates for deviations resulting from image acquisition (e.g. different scanning devices) as well as from tissue staining (e.g. different staining protocols), and thus overcomes the staining variations in images from various institutions.The code is publicly available at https://github.com/m4ln/stainTransfer_CycleGAN_pytorch. The data set supporting the solutions is available at https://doi.org/10.11588/data/8LKEZF.

Background

In both histology and surgical pathology, the inherent individual appearance of the considered object on the one hand or the different staining protocols on the other hand must be compensated in addition to factors that influence the image acquisition (e.g scanning devices). This demand applies to hematoxylin-eosin (HE) staining being the standard method in pathology but also to all other histochemical and immunohistochemical staining. Regarding HE-staining, solutions and protocols are standardized at first glance. However, even within a single institution, protocols may vary slightly and may not be coordinated with other institutes. Especially when training deep neural networks, for example for image classification, there is a need for stain normalization of images so that models are transferable to other data sets.

The term color normalization is an umbrella term for image processing techniques compensating for effects such as variable illumination, camera setting, etc. This evident request drives an active research. Conventional image processing such as color deconvolution or look-up tables with the need for selecting a reference template slide for normalization are widespread [1–6]. A particular but quite similar issue is stain quantification [7, 8]. Recent publications investigated in the use of deep learning approaches with Generative Adversarial Networks (GANs) and showed the benefits compared to the conventional methods [9, 10]. It was also shown how normalizing images using GANs can highly improve results of image classification [11] or segmentation [12]. Mahapatra et al. [13] integrate self-supervised semantic information such as geometric and structural patterns at different layers to improve stain normalization with CycleGANs.

In this work, we investigate the potential and limitation of a machine learning-based approach for normalization with a cycle consistent Generative Adversarial Network (CycleGAN) which learns the mapping from one HE-stain variant to an other. The approach we follow was proposed by Zhu et al. [14]. An image-to-image mapping is learned between two different HE-stained data sets to generate fake images in each image domain. We apply the technique to two independent data sets: the Mitosis-Atypia-14 challenge which provides two image sets of breast cancer tissue scanned with two different devices, and our HE-Staining Variation (HEV) data set, showing follicular thyroid carcinoma slices stained with different protocols. We evaluate the results using the Fréchet Inception Distance (FID) and the Structural Similarity Index Measure (SSIM). In addition, we apply our method to a tumor classification task of lymph node samples, in which we apply stain normalization on pathological lymph node data from our institute and test the gain from normalization on a ResNet classifier pre-trained on the Camelyon16 data set. We show that normalization can increase the confidence of the classifier not only prior to the training, but also when the test data is normalized on the training set and applied to the model after training. In addition to that, we demonstrate the potential of using this normalization for recoloring images with stains that are not present in the original data; and at the same time we achieve a higher accuracy in our classification task.

We provide both, our implementation at https://github.com/m4ln/stainTransfer_CycleGAN_pytorch, and our data set for normalization and classification at https://doi.org/10.11588/data/8LKEZF.

Methods

CycleGAN formulation

The CycleGAN from Zhu et al. [14] consists of two generator and discriminator pairs each of which learns the mapping from one image domain to the other. Given the image domains A and B with training images X_A and X_B, the generator G_B learns the mapping from A to B such that G_B:X_A→X_B, while the generator G_A learns the mapping in reverse direction, i.e. G_A:X_B→X_A. A discriminator D is a binary classifier. It decides whether a sample is real (1), i.e. given from the training data set, or fake (0), i.e. produced by the generator. More precisely, discriminator D_B learns to distinguish between real images $X_{B}^{real}$ and generated ones $X_{B}^{fake}$, while in the same way, D_A is trained to discriminate between $X_{A}^{real}$ and $X_{A}^{fake}$.

For training, the objective function to be optimized is modeled by two loss functions: the adversarial loss $\mathcal {L}^{adv}$ [15] and the cycle consistency loss $\mathcal {L}^{cyc}$ [14].

Adversarial loss

Introduced by Goodfellow et al. [15] the adversarial loss refers to the two-player game between the generator and the discriminator networks. More precisely, for the mapping G_B:X_A→X_B, the discriminator D_B is trained to classify $X_{B}^{real}$ and $X_{B}^{fake}$ correctly, while the generator seeks $X_{B}^{fake}$ being classified as real by the discriminator. In this way, both, the generator and the discriminator try to fool the other. Zhu et al. [14] use the least-squares loss as objective since it ensures stability during training and generates higher quality results. Thus, the adversarial loss function is expressed as follows [14]:

$$\begin{aligned} {}\min_{G_{B}}\max_{D_{B}}\mathcal{L}_{B}^{adv} &= \mathbb{E}_{X_{B}^{real}} \left[ D_{B} \left(X_{B}^{real} \right)^{2} \right]\\ &\quad+ \mathbb{E}_{X_{A}^{real}} \left[ \left(D_{B} \left(G_{B} \left(X_{A}^{real}\right) \right)-1 \right)^{2} \right], \end{aligned} $$

with $\mathbb {E}$ being the expected value over all instances of $X_{A}^{real}$ and $X_{B}^{real}$. In the same way, we can formulate the adversarial loss for the inverse mapping function G_A:X_B→X_A, i.e.

$$\begin{aligned} {}\min_{G_{A}}\max_{D_{A}}\mathcal{L}_{A}^{adv}&= \mathbb{E}_{X_{A}^{real}} \left[ D_{A} \left(X_{A}^{real} \right)^{2} \right]\\ &\quad+ \mathbb{E}_{X_{B}^{real}} \left[ \left(D_{A} \left(G_{A} \left(X_{B}^{real}\right) \right)-1 \right)^{2} \right], \end{aligned} $$

Thus, the total adversarial loss $\mathcal {L}^{adv}$ is obtained by the sum of both terms $\mathcal {L}^{adv}_{A}$ and $\mathcal {L}^{adv}_{B}$.

Cycle consistency loss

Zhu et al. [14] presented this loss function to enforce that both mapping functions G_A and G_B learned by the generators are inverse functions. In other words, if an image is mapped from one domain to the other domain the backward mapping should bring the image back to its original state. Thus, it must satisfy the cycle $X_{A}^{real} \rightarrow G_{B}(X_{A}^{real}) \rightarrow G_{A}(G_{B}(X_{A}^{real})) = X_{A}^{rec} \approx X_{A}^{real}$ and in the same way for $X_{B}^{real} \rightarrow G_{A}(X_{B}^{real}) \rightarrow G_{B}(G_{A}(X_{B}^{real})) = X_{B}^{rec} \approx X_{B}^{real}$ for the backward mapping. Therefore, the total cycle consistency is given by:

$$\begin{aligned} {}\mathcal{L}^{cyc}&= \underbrace{\mathbb{E}_{X_{A}^{real}} \left[ \left\| G_{A} \left(G_{B} \left(X_{A}^{real}\right) \right) - X_{A}^{real} \right\|_{1} \right]}_{\mathcal{L}^{cyc}_{A}}\\ &\quad+ \underbrace{\mathbb{E}_{X_{B}^{real}} \left[ \left\| G_{B} \left(G_{A} \left(X_{B}^{real}\right) \right) - X_{B}^{real} \right\|_{1} \right]}_{\mathcal{L}^{cyc}_{B}}, \end{aligned} $$

where ∥·∥₁ denotes the ℓ₁-Norm.

Hence, the total loss function is

$$\underset{G_{B}, G_{A}}{\arg\min} \ \underset{D_{B}, D_{A}}{\arg\max} \ \mathcal{L} = \mathcal{L}^{adv} + \lambda \mathcal{L}^{cyc}, $$

with λ being a regularization factor to control the relative importance of both, adversarial and cycle consistency losses.

Figure 1 illustrates the CycleGAN structure for mapping an image from domain A to domain B by $G_{B}: X_{A}^{real}\rightarrow X_{B}^{fake}$ and backwards by $G_{A}: X_{B}^{fake}\rightarrow X_{A}^{rec}$. The discriminator D_B tries to identify if an image is generated $X_{B}^{fake}$ or real $X_{B}^{real}$. During training, the network is optimized by computing the adversarial loss $\mathcal {L}^{adv}$ and the cycle consistency loss $\mathcal {L}^{cyc}$. The same process is done for the reverse direction when a real sample image $X_{B}^{real}$ is mapped from domain B to domain A, i.e $X_{B}^{real} \xrightarrow {G_{A}} X_{A}^{fake} \xrightarrow {G_{B}} X_{B}^{rec}$.

Data sets

Two data sets are used to asses the quality of stain normalization using CycleGAN: (a) The Mitos-Atypia-14 challenge data set in which the HE-stain in images appears different in color and resolution due to different scanning devices. (b) Our clinical HE-Staining Variation (HEV) data set, which contains images of serial sections that were subjected to different staining protocols. In addition, we apply our CycleGAN normalization method in a tumor lymph node classification task trained on the Camelyon16 data set [16] and tested on the TumorLymphnode data set collected at our institute.

Mitos-Atypia-14

This is a publicly available challenge data set containing breast-cancer images [17]. The tissue was HE-stained and scanned by two different whole-slide image (WSI) scanners: the Aperio ScanScope XT and the Hamamatsu Nanozoomer 2.0-HT. Both devices scan images with different resolutions, the Aperio 1539×1376 pixels and the Hamamatsu 1663×1485 pixels at X20 and X40 magnification. From each scanned set, 7,936 tiles are selected for training and 15,000 tiles for testing. We resize the images to 1024×1024 pixels and extract image tiles of 256×256 pixels as input to our network.

HE-Staining variation

At the Institute of Pathology, Medical Faculty Mannheim, Heidelberg University we collected serial sections of a follicular thyroid carcinoma and stained them with the following HE-staining variants: standard protocol (of the Institute of Pathology, Mannheim) HE-stain (henceforth HE), intentionally stained too short (henceforth shortHE), intentionally stained too long (henceforth longHE), only stained with hematoxylin (henceforth onlyH), and only stained with Eosin (henceforth onlyE). Figure 2 shows thumbnails from each WSI. For each set, we extract tiles of 256×256 pixels. We collect 10,000 and 15,000 tiles for training and testing, respectively. The whole data set including our training patches is made publicly available [18].

Image classification

The effect of HE-staining normalization is applied on a tumor classification task using two sets of data: (1) The Camelyon16 data set [16] containing the classes normal lymph node (n=3,318) and carcinoma infiltration (n=3,591), (2) Slides with normal lymph nodes and carcinoma infiltration from an interval of 15a (hereinafter called TumorLymphnode or short TL data set) collected from the Institute of Pathology, Heidelberg University. These are segmented manually and then cropped into tiles (n=3605 with normal lymph node parts, n=3,600 with carcinoma infiltration).

Training details

CycleGAN

In the first setting, we train our model on the Mitos-Atypia-14 challenge to learn the mapping between the two image sets X_A and X_B obtained by the scanners Aperio and Hamamatsu, respectively. We then train four additional models on the HEV data set, with set A being the standard stained tissue (see Fig. 2A) and set B being one of the other stained tissues (see Fig. 2B-E). For each model we choose 10,000 images for training. An overview of the experiments on stain normalization are given in the Appendix.

For the classification task, we train a CycleGAN to normalize the Camelyon16 data set to the standard HE and hematoxylin staining of the HEV data set and to the TL data set. In the same way we train another model to normalize the TL data set to the Cameylon16 data set and to the HE and hematoxylin staining of the HEV data set. For each image set we choose 5,000 images for training.

For all models we use the same network architecture as described by Zhu et al. [14]. We train each network for 60 epochs in total where the initial learning-rate is set to 2e⁻⁴ and then decreases to zero after every 30 epochs. The regularization factor λ is set to 10 for all experiments. Adam optimizer is used (β₁=0.5,β₂=0.999) with a batch size of 1. We train and evaluate the models on an NVIDIA Quadro P6000 graphics card.

ResNet

A ResNet [19] - in its Pytorch-implementation - is trained on the original Camelyon16 data set and its normalized versions to the HEV and TL data sets (using CycleGAN), to classify images into the two classes normal lymph node and carcinoma infiltration. Each network is trained for 100 epochs in total with a learning-rate of 2e⁻⁴.

Evaluation metrics

To assess the quality of the images generated by the CycleGAN we use the Fréchet Inception Distance (FID), to compare the distribution of generated images with the distribution of real images. In addition, the successful training of both generators is assessed by computing the Structural Similarity Index Measure (SSIM) in various windows between a real image and its reconstruction through the generator. For the classification task, we calculate kappa-values from the confusion matrix.

FID

This metric consists of the Fréchet distance also known as Wasserstein-2 distance computed on the basis of feature vectors. Here, a feature vector is the 2048-sized output of a pre-trained inception v3 model applied on one image. For the whole set of input images we get a sample of feature vectors with m₁ as its collective mean and C₁ as its covariance while for the GAN output images we get m₂,C₂ respectively [20]. The Fréchet distance is then applied to calculate the minimum distance between the means and covariances [21]:

$$\begin{aligned} {}d^{2}\ \left((m_{1}, C_{1}),(m_{2}, C_{2}) \right)&= ||m_{1} - m_{2}||^{2}\ \\ &\quad+ Tr\left(C_{1} + C_{2} - 2 \sqrt{C_{1} {\ast} C_{2}}\right). \end{aligned} $$

For identical image sets the FID is zero, whereas it increases with noise and disturbances. The python code for computing FID scores is used from [22].

SSIM

For a given original image x and the corresponding output of the GAN y the features luminance l(x,y), contrast c(x,y) and structure s(x,y) are compared on basis of the respective average, variance and covariance. The product of these components with the weighting factors α,β,γ yields the SSIM:

$${\mathrm SSIM}(\mathbf{x},\mathbf{y})=\left[l(\mathbf{x},\mathbf{y})\right]^{\alpha}\cdot\left[c(\mathbf{x},\mathbf{y})\right]^{\beta}\cdot\left[s(\mathbf{x},\mathbf{y})\right]^{\gamma}. $$

The SSIM metric is applied on image pairs showing the same image content. It ranges from 0 to 1 and equals one only for exact identical images. An SSIM close to zero hardly represents similar images [23]. We use the python code from [24] to compute SSIM scores.

Cohen’s kappa

The kappa-values are calculated from the confusion matrix using scikit-learn’s function cohen_kappa_score [25]. The kappa score ranges from -1 and 1. A value ≥ 0.8 is considered as almost perfect agreement while ≤ 0 means no agreement (i.e. random labels) [26].

Results

We show the visual and evaluation results of our CycleGAN experiments on the Mitos-Atypia-14 and HEV data sets. For generators G_A and G_B, image tiles from image domains A and B can be normalized in both directions such that $X_{A}^{real}\xrightarrow {G_{B}} X_{B}^{fake} \xrightarrow {G_{A}} X_{A}^{rec}$ and $X_{B}^{real} \xrightarrow {G_{A}} X_{A}^{fake} \xrightarrow {G_{B}} X_{B}^{rec} $. In addition, we show the results of our experiments on the lymph node classification task with HE-stain normalization.

CycleGAN visual results

Mitos-Atypia-14

Example results of on the Mitos-Atypia-14 data set are shown in Fig. 3. Columns A-C refer to the image tiles scanned by the Aperio scanner ($X_{A}^{real}$) being mapped by the generator G_B to produce the corresponding image in the domain of the Hamamatsu scanner ($X_{B}^{fake}$) and the reconstruction from mapping the image back to its original domain ($X_{A}^{rec}$). The same process is done in the reverse direction for image tiles scanned in domain B being mapped to domain A and backward (columns D-F). Each row 1-4 presents another example image.

HE-Staining variation

Figure 4 presents several test results when mapping a standard stained HE-image $X_{A}^{real}$ to one of the four stains of domain $X_{B}^{fake}$. Each block A-D shows another example tissue section. The top row of each block represents an exemplary image tile of the stain to be mapped into (shortHE, longHE, onlyH, onlyE), while the bottom row depicts the input image (HE) and the corresponding output for each stain.

CycleGAN evaluation

For all five experiments FID scores are shown in Fig. 5A. As reference, FID scores of all testing images from data sets A and B (blue) are computed. They range between 31.5 (MA14) to 203.68 (onlyE). Our experiments achieve on average FID scores for real vs. fake of 7.09 (A) and 6.93 (B), while for real vs. rec we obtain an average of 5.76 (A) and 5.58 (B). When mapping images from a source domain to a target domain, the FID scores compared to original images from the target domain improve up to 96% (blue vs. orange, red, green and purple). More precisely, for each experiment it is 76.85% (MA14), 91.93% (shortHE), 89.23% (longHE), 95.76% (onlyH), 95.57% (onlyE). A table with all FID is presented in the Appendix.

In addition, SSIM scores (see Fig. 5B) are computed between the real and their reconstructed images for each image domain A (blue) and B (orange). Each value refers to the average SSIM for all test images and the bars represent the corresponding standard deviation (SD). For each set A we obtain SSIM scores in the range of SSIM=0.94 (SD=0.02) (MA14) and SSIM=0.97 (SD=0.01) (onlyH), whereas for set B we obtain scores between SSIM=0.96 (SD=0.02) (MA14) and SSIM=0.98 (SD=0.01) (onlyH). A table with all SSIM scores is presented in the Appendix.

Stain normalization in image classification

In order to sow the impact of stain normalization on a downstream task such as image classification or segmentation, a ResNet [19] for the Camelyon16 data set, and its normalized versions, was trained. It should be noted that a standard color normalization to the mean and standard deviation as well as data augmentation with the albumentations package [27] is carried out in the standard training setting. The model learns to classify the images in lymph node parts with (hereinafter called tumor) and without (hereinafter called normal) carcinoma infiltration. Our three ResNets are tested on the Camelyon16 data set (n=830 images without and n=898 images with tumor) in original, normalized to the HEV data set and normalized to the TL data set as well as to the TL data set (n=902 images without and n=900 images with tumor) in original, normalized to the HEV data set and normalized to the Camelyon16 data set. As shown in Table 1 all ResNet models work best with images of the same HE-staining from the same training data set (Camelyon16 original, normalized to the HEV or TL data set). Furthermore, the models can best be applied to or transferred to the images from the TL data set, which in turn have the same HE-staining or stain normalization. However, with a best kappa-values of 0.55, this transfer from the Camelyon16 data set to the TumorLymphnode data set does not work very well. Also the ResNet-model trained on hematoxylin normalized images is able to classify other hematoxylin images correctly (kappa-value 0.77 for the Camelyon16 data set and 0.54 for the TL data set). The complete table of the results is shown in the Appendix.

Table 1 Kappa-values for the ResNet models that were trained on different versions of the Camelyon16 data set

Full size table

Table 2 Overview of our stain normalization experiments

Full size table

Discussion

This paper underlines that CycleGANs for image normalization in the context of digital pathology [12] yield a clear clinical benefit. The trained CycleGAN models show compelling results, both visually (Figs. 3 and 4) and quantitatively (Fig. 5) by obtaining FID scores up to 96% better for images mapped to a target domain. The trained models are able to fully convert to the desired color scheme while preserving the structural contents of the original image due to the cycle consistency constraint leading to SSIM scores greater than 0.9 when mapping generated images back to their source domain.

Furthermore, we could confirm that this normalization approach can be beneficial for image classification tasks. Similar to the results of Shaban et al. [10] for the Camelyon16 data set, Yuan et al. [28] for a number of publicly available data sets or de Bel et al. [12] in the application of image segmentation. Compared to earlier approaches on public data sets, in our work we demonstrate how stain normalization improves the classification accuracy, when applied to multiple internal data on a pre-trained model. When normalizing HE-stain images from our institute to the Camelyon16 data set and testing a ResNet-model that is only trained on Camelyon16 data the kappa-value increased more than 50%. Furthermore, we were able to show in a first approach that such a normalization could also enable the use of different colored data sets for downstream tasks, e.g. by converting both data sets to a different staining. However, further work, e.g using more different staining methods, is certainly necessary here.

Some limitations of the used CycleGAN-model can also be seen when mapping images obtained by different scanning devices with varying resolutions. This can cause a loss in structural information despite the consistently good quality of the color normalization. With the HEV data set, the generated images look very realistic compared to the original images in a target domain without any decline in the image content (see Fig. 4).

The CycleGAN approach used here is able to successfully learn every mapping between two image stains and can instantly normalize any unseen image if it is within one of the trained stains. For each other staining, the network needs to be retrained from scratch. However, the network is able to learn even from a small amount of images (1,000-10,000 per set) which can be obtained from a single WSI. In addition, the images do not have to be labeled or paired to learn the mapping between two domains. The network can learn to add a stain to images which is not present in the source domain, e.g we are able to create a full HE-stained image from an image which has only a single stain (onlyH, onlyE) or vice versa. This can simplify the manual staining process. How this affects other stains besides HE needs further investigation.

Conclusion

In this paper we show that CycleGANs are a powerful tool for normalization of different variants of HE-stains and tissue types. We validated this approach on data sets covering images from different scanning devices, staining protocols and tissue types and additionally showed the impact on stain normalization on the downstream task of image classification. The method has been successfully applied to compensate for variances resulting from image acquisition as well as from tissue staining while preserving structural content of the images. In order to make use of this approach in a clinical manner, the training process should be accelerated, i.e. using transfer learning, an increased batch size and specialized hardware. The method may be added to various image processing frameworks at WSI level to be applied to tasks such as classification or segmentation.

Appendix

Table 3 FID scores for all experiments between real and generated (fake, rec) images for A and B

Full size table

Table 4 SSIM scores (SD = standard deviation) for all experiments between real and rec images for A and B

Full size table

Table 5 Kappa-values for the ResNet models that were trained on different versions of the Camelyon16 data set

Full size table

Availability of data and materials

The code is publicly available at https://github.com/m4ln/stainTransfer_CycleGAN_pytorch. The data set supporting the solutions is available at https://doi.org/10.11588/data/8LKEZF.

Abbreviations

HE:: Hematoxylin-eosin
GAN:: Generative adversarial network
FID:: Fréchet inception distance
SSIM:: Structural similarity index measure

References

Bianconi F, Kather JN, Reyes-Aldasoro CC. Experimental assessment of color deconvolution and color normalization for automated classification of histology images stained with hematoxylin and eosin. Cancers. 2020; 12(11). https://doi.org/10.3390/cancers12113337.
Bukenya F. A hybrid approach for stain normalisation in digital histopathological images. Multimed Tools Appl. 2020; 79(3):2339–2362. https://doi.org/10.1007/s11042-019-08262-0.
Article Google Scholar
Vicory J, Couture HD, Thomas NE, Borland D, Marron JS, Woosley J, Niethammer M. Appearance normalization of histology slides. Comput Med Imaging Graph. 2015; 43:89–98. https://doi.org/10.1016/j.compmedimag.2015.03.005.
Article Google Scholar
Khan AM, Rajpoot N, Treanor D, Magee D. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Trans Biomed Eng. 2014; 61(6):1729–1738. https://doi.org/10.1109/TBME.2014.2303294.
Article Google Scholar
Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, Schmitt C, Thomas NE. A method for normalizing histology slides for quantitative analysis. In: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro: 2009. p. 1107–1110. https://doi.org/10.1109/ISBI.2009.5193250.
Bautista PA, Yagi Y. Staining correction in digital pathology by utilizing a dye amount table. J Digit Imaging. 2015; 28(3):283–294. https://doi.org/10.1007/s10278-014-9766-0.
Article Google Scholar
Ruifrok A, Johnston D. Quantification of histochemical staining by color deconvolution. Anal Quant Cytol Histol. 2001; 23(4):291–9.
CAS PubMed Google Scholar
Reinhard E, Adhikhmin M, Gooch B, Shirley P. Color transfer between images. IEEE Comput Graph Appl. 2001; 21(5):34–41. https://doi.org/10.1109/38.946629.
Article Google Scholar
Ghazvinian Zanjani F, Zinger S, Ehteshami Bejnordi B, van der Laak J, With P. Stain normalization of histopathology images using generative adversarial networks: 2018. p. 573–577. https://doi.org/10.1109/ISBI.2018.8363641.
Shaban MT, Baur C, Navab N, Albarqouni S. StainGAN: stain style transfer for digital histological images. 2018. http://arxiv.org/abs/1804.01601. Accessed 09 Dec 2020.
Swiderska-Chadaj Z, de Bel T, Blanchet L, Baidoshvili A, Vossen D, van der Laak J, Litjens G. Impact of rescanning and normalization on convolutional neural network performance in multi-center, whole-slide classification of prostate cancer. Sci Rep. 2020; 10(1):14398. https://doi.org/10.1038/s41598-020-71420-0.
Article CAS Google Scholar
de Bel T, Hermsen M, Kers J, van der Laak J, Litjens G. Stain-transforming cycle-consistent generative adversarial networks for improved segmentation of renal histopathology In: Cardoso MJ, Feragen A, Glocker B, Konukoglu E, Oguz I, Unal G, Vercauteren T, editors. Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning. Proceedings of Machine Learning Research, vol. 102. London: PMLR: 2019. p. 151–163. http://proceedings.mlr.press/v102/de-bel19a.html.
Google Scholar
Mahapatra D, Bozorgtabar B, Thiran J-P, Shao L. Structure Preserving Stain Normalization of Histopathology Images Using Self-Supervised Semantic Guidance. 2020. http://arxiv.org/abs/2008.02101. Accessed 09 Dec 2020.
Zhu J-Y, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. 2020. http://arxiv.org/abs/1703.10593. Accessed 09 Dec 2020.
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative Adversarial Networks. 2014. http://arxiv.org/abs/1406.2661. Accessed 09 Dec 2020.
Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, van der Laak JAWM, the CAMELYON16 Consortium. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA. 2017; 318(22):2199–2210. https://doi.org/10.1001/jama.2017.14585. https://jamanetwork.com/journals/jama/articlepdf/2665774/jama_ehteshami_bejnordi_2017_oi_170113.pdf.
Article Google Scholar
MITOS-ATYPIA-14 Grand Challenge. https://mitos-atypia-14.grand-challenge.org/. Accessed 22 Mar 2020.
Runz M, Weis C-A. Normalization of HE-Stained Histological Images using Cycle Consistent Generative Adversarial Networks [Dataset]. heiDATA. 2021. https://doi.org/10.11588/data/8LKEZF.
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2015. http://arxiv.org/abs/1512.03385. Accessed 11 May 2021.
Brownlee J. How to implement the frechet inception distance (fid) for evaluating gans. 2019. https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/. Accessed 15 Jan 2021.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. 2018. http://arxiv.org/abs/1706.08500. Accessed 15 Jan 2021.
w, 13b3. SSIM-py Structural Similarity (SSIM) index, where the core dependency is NumPy. 2019. https://github.com/w13b3/SSIM-py. Accessed 20 Dec 2020.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004; 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861.
Article Google Scholar
Seitzer M. pytorch-fid: FID Score for PyTorch. 2020. https://github.com/mseitzer/pytorch-fid. Version 0.1.1.. Accessed 18 May 2021.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825–2830.
Google Scholar
McHugh ML. Interrater reliability: the kappa statistic. Biochemia Medica. 2012; 22(3):276–282. 23092060[pmid].
Article Google Scholar
Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: Fast and flexible image augmentations. Information. 2020; 11(2). https://doi.org/10.3390/info11020125.
Yuan E, Suh J. Neural stain normalization and unsupervised classification of cell nuclei in histopathological breast cancer images. arXiv preprint arXiv:1811.03815. 2018.

Download references

Acknowledgements

We thank our project partners Smart In Media AG for the collaboration in this project.

Funding

Our work is supported by the grant ZIM ZF4689501TS9. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute of Pathology, University Medical Centre Mannheim, Heidelberg University, Mannheim, Germany
Marlen Runz, Daniel Rusche & Cleo-Aron Weis
Mannheim Institute for Intelligent Systems in Medicine (MIISM), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Marlen Runz & Jürgen Hesser
Center for Mass Spectrometry and Optical Spectroscopy (CeMOS), Mannheim University of Applied Sciences, Mannheim, Germany
Stefan Schmidt
Smart In Media AG, Köln, Germany
Martin R. Weihrauch
Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
Jürgen Hesser
Central Institute for Computer Engineering (ZITI), Heidelberg University, Heidelberg, Germany
Jürgen Hesser

Authors

Marlen Runz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rusche
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Martin R. Weihrauch
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Hesser
View author publications
You can also search for this author in PubMed Google Scholar
Cleo-Aron Weis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M. Runz: study design and coordination, data formation, model training, statistical evaluation, manuscript drafting. D. Rusche: statistical evaluation, manuscript drafting. S. Schmidt: manuscript drafting. M. R. Weihrauch: manuscript drafting. J. Hesser: study design and coordination, manuscript drafting. C.A. Weis: study design and coordination, data formation, model training, statistical evaluation, manuscript drafting. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Marlen Runz.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Runz, M., Rusche, D., Schmidt, S. et al. Normalization of HE-stained histological images using cycle consistent generative adversarial networks. Diagn Pathol 16, 71 (2021). https://doi.org/10.1186/s13000-021-01126-y

Download citation

Received: 24 February 2021
Accepted: 05 July 2021
Published: 06 August 2021
DOI: https://doi.org/10.1186/s13000-021-01126-y

Normalization of HE-stained histological images using cycle consistent generative adversarial networks

Abstract

Background

Methods

Results

Conclusions

Background

Methods

CycleGAN formulation

Adversarial loss

Cycle consistency loss

Data sets

Mitos-Atypia-14

HE-Staining variation

Image classification

Training details

CycleGAN

ResNet

Evaluation metrics

FID

SSIM

Cohen’s kappa

Results

CycleGAN visual results

Mitos-Atypia-14

HE-Staining variation

CycleGAN evaluation

Stain normalization in image classification

Discussion

Conclusion

Appendix

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Diagnostic Pathology

Contact us