Skip to main content

Automated region of interest retrieval and classification using spectral analysis


Efficient use of whole slide imaging in pathology needs automated region of interest (ROI) retrieval and classification, through the use of image analysis and data sorting tools. One possible method for data sorting uses Spectral Analysis for Dimensionality Reduction. We present some interesting results in the field of histopathology and cytohematology.

In histopathology, we developed a Computer-Aided Diagnosis system applied to low-resolution images representing the totality of histological breast tumour sections. The images can be digitized directly at low resolution or be obtained from sub-sampled high-resolution virtual slides. Spectral Analysis is used (1) for image segmentation (stroma, tumour epithelium), by determining a «distance» between all the images of the database, (2) for choosing representative images and characteristic patterns of each histological type in order to index them, and (3) for visualizing images or features similar to a sample provided by the pathologist.

In cytohematology, we studied a blood smear virtual slide acquired through high resolution oil scanning and Spectral Analysis is used to sort selected nucleated blood cell classes so that the pathologist may easily focus on specific classes whose morphology could then be studied more carefully or which can be analyzed through complementary instruments, like Multispectral Imaging or Raman MicroSpectroscopy.


Efficient use of whole slide imaging (WSI) in Pathology needs an automated region of interest (ROI), retrieval and classification; this can be achieved through the use of image segmentation and data sorting tools. The present paper aims at illustrating, through two examples, the power of spectral analysis, which can be used alone or in addition to image segmentation for data reduction, feature classification as well as image visualisation.

Materials and methods


The first application concerns 73 WSI of HES stained histological sections of breast tumours recorded at a resolution of 6.3 μ m/pixel. The second one concerns a WSI of MG stained blood smear recorded at a resolution of 0.17 μ m/pixel using an Aperio slide scanner.

A minimal segmentation was performed to isolate breast tumour tissue or to eliminate erythrocytes from blood smear.

Principle of spectral analysis

The main point of this technique is to introduce a useful metric on data set based on the connectivity of points within the graph of data, and also provide coordinates on the data set that reorganize the points according to this metric [1, 2]. Let X = {x 1 ,x 2 ,...,x N } be N data points (images), each data x i ϵRn where n is the dimension of the space data (measures). The first step is to represent the dataset X = {x 1 ,x 2 ,...,x N } by a weighted symetric graph G = (V, E) where each data point x i corresponds to a node. Two nodes x i and x j are connected by an edge with weight w(xi,xj) = w(xj,xi), reflecting the degree of similarity (or affinity) between these two points. The weight w(.,.) describes the first-order interaction between the data points and its choice is application-driven. For instance, in applications where a distance d(.,.) already exists on the data, it is custom to weight the edge between x i and x j by:

where ε > 0 is a scale parameter, while other weighting functions can be also used.

Following a classical construction in spectral graph theory and manifold learning, we now create a random walk on the data set X by forming the kernel:

p ( x i , x j ) = w ( x i , x j ) d ( x i ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aaeWaaeaacqWG4baEdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabdIha4naaBaaaleaacqWGQbGAaeqaaaGccaGLOaGaayzkaaGaeyypa0tcfa4aaSaaaeaacqWG3bWDdaqadaqaaiabdIha4naaBaaabaGaemyAaKgabeaacqGGSaalcqWG4baEdaWgaaqaaiabdQgaQbqabaaacaGLOaGaayzkaaaabaGaemizaq2aaeWaaeaacqWG4baEdaWgaaqaaiabdMgaPbqabaaacaGLOaGaayzkaaaaaaaa@474A@


d ( x i ) = x k X w ( x i , x k ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemizaqMaeiikaGIaemiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkcqGH9aqpdaaeqbqaaiabdEha3jabcIcaOiabdIha4naaBaaaleaacqWGPbqAcqGGSaalaeqaaOGaemiEaG3aaSbaaSqaaiabdUgaRbqabaGccqGGPaqkaSqaaiabdIha4naaBaaameaacqWGRbWAaeqaaSGaeyicI4SaemiwaGfabeqdcqGHris5aaaa@4547@

is the degree of node x i .

As we have that p(x i , x j ) ≥ 0 and

x j X p ( x i , x j ) = 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabuaeaacqWGWbaCcqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKMaeiilaWcabeaakiabdIha4naaBaaaleaacqWGQbGAaeqaaOGaeiykaKcaleaacqWG4baEdaWgaaadbaGaemOAaOgabeaaliabgIGiolabdIfaybqab0GaeyyeIuoakiabg2da9iabigdaXaaa@4022@

the quantity p(x i , x j ) ≥ 0 can be interpreted as the probability of random walker to jump from x i to x j in single time step.

From spectral theory and harmonic analysis we know that the eigenfunctions can be interpreted as a generalization of the Fourier harmonics on the manifold defined by the data points. In our problem, smaller eigenvalues correspond to higher frequency eigenfunctions, and larger eigenvalues correspond to lowers ones.

The eigenvalues and eigenvectors provide embedding coordinates for the set X. The data points can be mapped into Euclidean space via embedding:

is known as the Fiedler vector and can be used to order the underlying dataset X (segmentation and data reduction). When it is associated with the third eigenvector ψ3, it allows a visualization of the base.


Breast cancer

In this case, the ψ2 eigenvector was used to segment tumour tissue into two classes: stroma and epithelial zones (Figure 1). The method has been applied to all the images of the database. Then spectral analysis was used to select the most representative epithelial zone patches of each histological type (Figure 2). Finally, ψ2 and ψ3 allow a data sorting and a visualization of each patch and its neighbourhood in order to present the most similar patches (Figure 3).

Figure 1
figure 1

Result of breast VS segmentation by spectral analysis: (a) visualization of data sorting allowing segmentation of (b) the original image, (c) result of the segmentation.

Figure 2
figure 2

Selection of the most representative (□) epithelial zone patches of each histological type.

Figure 3
figure 3

Visualization of each patch and its neighbourhood in order to exhibit the most similar patches.

Blood smears

For this application, spectral analysis was used to "segment", by data sorting, the image base of isolated blood cells into two classes: polymorphonuclear cells and lymphocytes (Figure 4).

Figure 4
figure 4

Result of isolated white blood cell base "segmentation" by spectral analysis: (a) visualization of the data sorting (lymphocytes are shown in blue) allowing the partition of (b) the base of isolated cells, (c) view of isolated cells sorted by spectral analysis.


Spectral Analysis is a promising approach for computer aided diagnosis of cancers (automated global analysis of histological tumour sections) as well as for automated sorting of isolated cells. The resulting concentration of objects of interest allows the pathologist to focus on specific regions whose morphology can be further studied more carefully or analyzed through complementary instruments, like Multispectral Imaging or Raman spectroscopy.


  1. Xiaofei He, Wei-Ying Ma, Hongjiang Zhang: Imagerank: spectral techniques for structural analysis of image database. Multimedia and Expo. 2003, 1: 25-8. . ICME'03. Proceedings. 2003 International Conference on, 6–9 July 2003

    Google Scholar 

  2. Lafon S: Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. PNAS. 102 (21): 7426-7431. May 24, 2005;

Download references


The authors gratefully acknowledge Dr Paulette Herlin, Dr Benoît Plancoulaine, Dr Jacques Chasle, Dr Georges Flandrin, the Regional Council of Lower Normandy, the "Comité départemental du Calvados de la Ligue de Lutte Contre le Cancer" and the General Council of "Hauts-de-Seine".

This article has been published as part of Diagnostic Pathology Volume 3 Supplement 1, 2008: New trends in digital pathology: Proceedings of the 9th European Congress on Telepathology and 3rd International Congress on Virtual Microscopy. The full contents of the supplement are available online at

Author information

Authors and Affiliations


Corresponding author

Correspondence to Myriam Oger.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Oger, M., Belhomme, P., Klossa, J. et al. Automated region of interest retrieval and classification using spectral analysis. Diagn Pathol 3 (Suppl 1), S17 (2008).

Download citation

  • Published:

  • DOI: