Cell nuclei extraction from breast cancer histopathologyimages using colour, texture, scale and shape information

Cell nuclei extraction from Haematoxylin and Eosin (H&E) stained breast cancer slide images is a challenging task due to the high content complexity of images: nuclei have heterogeneous appearance and overlap while the background is complex and non-homogeneous. This causes standard extraction methods to perform poorly.

Cell nuclei extraction from breast cancer histopathologyimages using colour, texture, scale and shape information Antoine Veillard 1,2* , Maria S Kulikova 1 , Daniel Racoceanu 1,2 From 11th European Congress on Telepathology and 5th International Congress on Virtual Microscopy Venice, Italy. 6-9 June 2012 Background Standard cancer diagnosis and prognosis procedures such as the Nottingham Grading System for breast cancer incorporate a criterion based on cell morphology known as cytonuclear atypia. Therefore, algorithms able to precisely extract the cell nuclei are a requirement in computer-aided diagnosis applications.
However, unlike other modalities such as needle aspiration biopsy images, H&E stained surgical breast cancer slides are a particularly challenging image modality due to the heterogeneity of both the objects and background, low object-background contrast and frequent overlaps as illustrated in Figure 1. As a consequence, existing extraction methods which are largely reliant on color intensities do not perform well on such images.

Materials and methods
We propose a method based on the creation of a new image modality consisting in a grayscale map where the value of each pixel indicates its probability of belonging to a cell nuclei. This probability map is calculated from texture and scale information in addition to simple pixel color intensities. The resulting modality has a strong object-background contrast and evens out the irregularities within the nuclei or the background. The actual extraction is performed using an AC model with a nuclei shape prior included to deal with overlapping nuclei.
The same process is repeated at 4 different scales after locally re-sampling the image using Lanczos-3 sinc kernels. Re-sampled images are locally computed around each pixel to allow the computation of the 15 texture features for the same pixel at different scales. Local texture features are computed at 1:1, 1:2, 1:4 and 1:8 scales for every pixel.

Probability map
The resulting 180-dimensional feature vector x is used to compute the probability p n (x) of each pixel to belong to a cell nuclei. Let μ n (resp. μ b ) be the mean of the feature vectors for the pixels belonging to the nuclei (resp. to the background). A class dependent LDA is performed in order to find two directions in the feature space, w n and w b , such that the projection of the classes on these directions has a maximum inter-class scatter over within-class scatter ratio. The estimated class probability associated with the feature vector x is then calculated from the linear scores l n = (x -μ n ) · w n and l b = (x -μ b ) · w b using the softmax function: The resulting probability map exhibits strong contrast between the objects and the background. Moreover, nuclei and background appear more homogeneous than in the original image. A post processing step is also applied to fill small holes still remaining in nuclei (larger holes are not removed to prevent the unintended deletion of interstices between different nuclei).

AC model including shape prior
The actual extraction of cell nuclei is performed from the probability map with an AC model with shape prior information. The total energy E(g) associated to a contour γ is a weighted sum of an image term E i (g) and a shape term E s (g). The latter is itself the weighted sum of a smoothing term E sm (g) and a shape prior term E sp (g).
The shape prior term E The shape prior information allows to properly extract overlapping nuclei according to their expected shape without arbitrarily discarding the overlapping parts. The detection of nuclei is performed by a marked point process model the details about which the interested reader can find in [4]. An empirical study in [5] shows that this particular combination of MPP and AC overperforms other state-of-the-art methods for nuclei detection and extraction.

Results and discussion
The training set used for the LDA consists of 6 1024×1024 images where the nuclei have been manually delineated by a pathologist. Object and background parameters used in the AC model are also calculated from the training set. Weight parameters for the energy terms in the AC model are adjusted with a grid search. Images used for training are distinct from the images used for validation. Figure 2 shows results obtained with the AC model applied to the probability map side-by-side with results obtained with the same AC model applied to the original image (in fact, the slightly better performing haematoxilin image from the color deconvolution was used instead of the red channel from the RGB image commonly used in other methods [6]). On the original image, the contours have a tendency to match irregularities within the cell nuclei rather than their actual boundaries. This problem is largely improved by using the probability map where the