- Proceedings
- Open Access
- Published:

# How to measure diagnosis-associated information in virtual slides

*Diagnostic Pathology***volume 6**, Article number: S9 (2011)

## Abstract

The distribution of diagnosis-associated information in histological slides is often spatial dependent. A reliable selection of the slide areas containing the most significant information to deriving the associated diagnosis is a major task in virtual microscopy. Three different algorithms can be used to select the appropriate fields of view: 1) Object dependent segmentation combined with graph theory; 2) time series associated texture analysis; and 3) geometrical statistics based upon geometrical primitives. These methods can be applied by sliding technique (i.e., field of view selection with fixed frames), and by cluster analysis. The implementation of these methods requires a standardization of images in terms of vignette correction and gray value distribution as well as determination of appropriate magnification (method 1 only). A principle component analysis of the color space can significantly reduce the necessary computation time. Method 3 is based upon gray value dependent segmentation followed by graph theory application using the construction of (associated) minimum spanning tree and Voronoi’s neighbourhood condition. The three methods have been applied on large sets of histological images comprising different organs (colon, lung, pleura, stomach, thyroid) and different magnifications, The trials resulted in a reproducible and correct selection of fields of view in all three methods. The different algorithms can be combined to a basic technique of field of view selection, and a general theory of “image information” can be derived. The advantages and constraints of the applied methods will be discussed.

## Introduction

Virtual microscopy which is the work with virtual slides can be performed in two different manners: 1) interactive virtual microscopy and 2) automated virtual microscopy [1, 2]. Interactive virtual microscopy translates the pathologist’s work with conventional glass slides into the digital world, and leaves all work on the microscope to the pathologist. It includes slide navigation, magnification, illumination, focus, etc. Some digital features might be added, especially the contemporary view of different slides, automated storage of areas of interest (with inbuilt expert consultation), or creation of labels. Automated virtual microscopy tries to transfer as many functions as possible to the computer with the final aim, that the system evaluates and proposes the most likely diagnoses [3–5]. Such a system must translate all items of the pathologist’s work into computerized algorithms. These have not necessarily to work in a fully compatible manner; however, they must contain modules that reflect to the corresponding pathologist’s work [6, 7]. These modules will probably work in a “time sequence order”, and include in addition to statistical procedures and classifiers tools that provide a reproducible and constant image quality, object, structure, and texture related magnifications, image analysis procedures, and field of interest recognition programs.

We want to describe some basic ideas and information recognition algorithms in image analysis that can be used for field of view detection in virtual slides which is the position and size of image compartments that posses the strongest association with the underlying disease.

## Basic assumptions

The pathologist’s work is the evaluation of a diagnosis from a microscopic image, which is an image analysis algorithm in combination with external (clinical) data [1, 8, 9]. The pathologist’s view focuses on specific biological meaningful objects and their spatial arrangement (structure) which include a) normal objects (cells, nuclei, etc.), b) abnormal objects (cancer cells, etc.), c) external objects (bacteria, parasites, silica, etc.), d) preserved structure with abnormal cellular societies (inflammatory infiltrates, fibrosis, etc.), e) destroyed structure (granuloma, necrosis, etc.), and f) abnormal structures (adenocarcinoma, sarcomatous growth pattern) [10–13]. A diagnosis from a histological image can be evaluated by recognition and classification of the objects, the formed structures, and their spatial arrangement. It is useful to introduce different levels of structures in order to describe for example the infiltration of lymphocytes into a vascular wall (a vessel would be of higher order compared to a lymphocyte because a vessel is built by a cellular sociology including endothelial cells, smooth muscle cells, a basal membrane, etc.). The details of this concept have been described in Kayser et al. [10, 14–16].

The term information is derived from the latin word informare which means “create by teaching”, in other words a communication procedure between a source (image) and the (understanding) receiver (pathologist). Shannon has analyzed the specific conditions of information transfer and content [17, 18]. According to his theory information limits the broad variety of reactions of an (understanding) receiver to only one or a few appropriate ones. In other words, information is a statistical property and can be analyzed by statistical methods. Shannon introduced the term entropy as principle measure of information, which is derived from classic thermodynamics [17, 18]. Entropy is a measure of the distance of a statistical population from its end stage using Kolmogorov’s axiomatic approach of non-overlapping elementary events that are characterized by a probability 0 __<__ p __<__ 1.

Entropies (E = ∑{pi * ln(pi)}) of different systems can be simply added (Es = ∑(Ei), if there exists no correlation between the elements of the different systems (so called strong chaos), otherwise the more general term of Tsallis entropy has to be used (Es = ∑(Eq1+Eq2) + (1-q)*Eq(1)*Eq(2)) [19–21].

## Macro- and microstages

The basic elements of a system characterized by pi might be equally distributed in the system’s space, or agglutinate to certain formations which can be considered as a “subspace”. They are called macrostages. One can define the macrostages as new (higher order) events, and calculate the entropy of the original system based upon the macrostages and their internal entropy [22, 23]. The number of microstages gives the maximum number of potential macrostages. An illustrating example is shown in figure 1 which is described in detail in [22]. The letter {T,H,I,S} are the microstages, and the words {THIS, IS, ISIS} form the macrostages. The sequence of the macrostages form the structure.

The calculated Shannon entropies of the macrostages within the system (This is Isis) result in:

This: [-0.92]; is: [-0.46]; Isis: [- 0.64]; ∑ = - 2.02

that of the total system without macrostages {this is Isis} = -1.58,

and based upon the marcostages alone {[this] [is] [Isis]} = -1.08

The calculated probability of the macrostages based upon their internal entropies results in:

P(this) = (1.92)/5.02 = 0.38

P(is) = (1.46)/5.02 = 0.29

P(isis)= (1.64)/5.02 = 0.33

This is Isis: E = {0.38*ln 0.38 + 0.29*ln 0.29 + 0.33 * ln 0.33 = - 1.09

The differences between the macrostages are: [-0.46] + [+0.22] = - 0.18.

If we transform the sequence into the question:

Is this Isis? we will get: [+0.46] + [-0.28] = + 0.18.

The calculation of the total entropy of the (macrostage) system depends upon its structure, or, in other words, the calculation of macrostage entropies can be applied in relation to internal structures, such as sequential arrangement or spatial relationships [16, 22, 23].

### Entropy calculation in relation to histological images (virtual slides)

The information of a histological image which a pathologist can derive depends on the presence and spatial arrangement of cells or nuclei respectively. The different cell types that are present in such a slide can be addressed to microstages, and the corresponding disease to macrostages respectively. The microscopic images, the associated diagnoses, and the analyzed microstages are shown in the figure 2 – figures 4. All in all 15 different cell types, and 8 different diseases are taken into account (figure 5). The assumed cellular distributions are given in figure 6, and the computed entropies are shown in figure 7. As expected, notable differences exist between the different images (diseases). They are, however, not striking between quite different diseases, for example between a small cell lung cancer and normal lung parenchyma. The computed entropies can obviously not directly be translated to the biological significance of the corresponding disease.

## How to refine the entropy approach?

### Definition of image associated macro- and microstages

All (interactive) diagnostic information of a digitized image is derived from biological meaningful objects such as cells, nuclei, mitoses, vessels, etc. In other words, an analysis of the image information results in a meaning, which is a probability function of the (predefined) diagnoses and the image information. The higher the probability the more accurate is the diagnosis. The advantage of such an algorithm is the “relatively” constancy of objects (and derived information) compared to the broad variations of images belonging to the same diagnosis [3, 8].

We can consider that image information is an entity that is primarily separated from the set of diagnoses. This theory induces that image information can be described as a mapping of diagnoses M(D) on the image pixels {p(x,y,g)} with

M({Di},P) -> p{px(x,y,g)} with p{px(x,y,g)} = maximum

for the (evaluated) diagnosis D.

Using the entropy approach we create a n-dimensional space of elementary image events and analyse the distribution in the different diseases or macrostages. It would be of formal advantage, if we could define certain elementary events that are independent from the associated meaning, i.e. independent from external knowledge. In fact, this is possible by application of stochastic geometry which has been described by Stoyan et al [24].

Naturally, one could use the pixels as elementary events and associated spectral functions in order to create the set of elementary events. However, this approach would leave us again with the problem of handling broad image variance and low probability levels.

The basic elements (or image primitives) can also be calculated by introduction of a (spatial) relationship function. It is usually called neighbourhood condition, such as Voronoi’s or O’Callaghan’s condition [25–27]. The simplest case is a neighbourhood function f(x,y) with

F(0,1) = 1 iff g(x,y)>threshold, and g(x+1,y)>threshold, or g(x,y+1)>threshold, i.e., two pixels are neighbors iff both of them posses a gray value above a certain predefined threshold (or within a predefined bandwidth of gray values). Naturally, a negative definition can also be applied (<threshold).

This definition allows us to define a set of primitive elements, that form an object, i.e. an elementary element of image information (object, structure, texture).

The different primitive elements include

Isolated points (i.e. pixels without neighbors)

Fibers (pixels possessing a “line” of neighboring pixels, and different start and end pixel

Circles (pixels possessing a line of neighboring pixels, and identical start and end pixel

Plateaus (a set of pixels with a number of neighboring pixels>2 and connected points or lines).

Any biological meaningful object can be broken down to a set of these four primitives; for example a membrane consists of a line or a circle, a nucleus of a circle and at least one plateau, a non-completely segmented nucleus of lines, points, and plateaus, etc. .

In potential clinical application, this approach has to work with approximately 800 different macrostages (lung diseases, derived from [28], and 10,000 different features (see figure 8). To discriminate between different macrostages with a significance of s>0.95 only 55 features per macrostage out of 1,100 available features per macrostage would be necessary.

## Implementation

The selection of an appropriate threshold and/or bandwidth of the gray values as well as the image size in pixels are parameters that influence directly the implementation of this algorithm. Therefore, it has been tested on automated selected areas of interest which have been determined by analysis of texture and object features, as described elsewhere. Within the selected areas of interest the chosen threshold is of only limited influence on the expression of the elementary primitives in contrast to the whole image (see figure 9). Thus, working in correctly selected areas of interest a threshold can be chosen within a broad range without falsifying the results. On the other hand, the described technique might be useful to check the correctness of the selected field of view. An approach to finally classify diseases by the described algorithm is in preparation.

## Discussion

The development of reliable and practice oriented scanners which scan whole glass slides has opened a new door in diagnostic surgical pathology or tissue – based diagnosis [1, 3, 5, 6, 29–31]. The mechanical and optical problems are in so far solved as the new canner generation can be successfully implemented into the workflow of routine diagnosis [9]. The next step waits for opening new and attractive functions of these systems. These will include the mandatory replacement and improvement of classic microscope handling, the implementation of new viewing and measurement functions, as well as the search for automated diagnosis systems. These will probably start with the implementation of so-called assistants that will guide the pathologist through all the possible tools. As in all such trends, the final aim would probably be an automated diagnosis system, which the pathologist has to control, and which might at a very later stage control itself.

In this article we describe only one of the possible manners to build and to implement such a system. Other algorithms have been successfully tested too [32–34]. The main idea is that we try to separate different functions that are used in the pathologist’s thinking and diagnostics, and not to be confused with the contemporary application of algorithms that are in principle separated. When in the Middle Ages some genius persons tried to directly copy the flight of birds, they failed because they did not separate the upstream forces from the velocity (forward) movement. The separation of both forces induced the successful development of airplanes that have thought to be never become reality in the past.

We have shown the reader an approach that in a similar manner separates the information given in an image, and its evaluation and interpretation based upon known classification of information (diseases) by a pathologist. Having finally tested the approach, a more generalized theory of performing information into knowledge and competence in virtual microscopy is indicated.

## Acknowledgement

The financial support of the Verein zur Förderung des biologisch technologischen Fortschritts in der Medizin e.V. gratefully acknowledged.

This article has been published as part of *Diagnostic Pathology* Volume 6 Supplement 1, 2011: Proceedings of the 10th European Congress on Telepathology and 4th International Congress on Virtual Microscopy. The full contents of the supplement are available online at http://www.diagnosticpathology.org/supplements/6/S1

## References

- 1.
Kayser K, Molnar B, Weinstein RS: Virtual Microscopy - Fundamentals - Applications - Perspectives of Electronic Tissue - based Diagnosis. 2006, VSV Interdisciplinary Medical Publishing

- 2.
Weinstein RS: Innovations in medical imaging and virtual microscopy. Hum Pathol. 2005, 36 (4): 317-9. 10.1016/j.humpath.2005.03.007.

- 3.
Kayser K, et al: Towards an automated virtual slide screening: theoretical considerations and practical experiences of automated tissue-based virtual diagnosis to be implemented in the Internet. Diagn Pathol. 2006, 1: 10-10.1186/1746-1596-1-10.

- 4.
Kepper N: Visualization, Analysis, and Design of COMBO-FISH Probes in the Grid-Based GLOBE 3D Genome Platform. Stud Health Technol Inform. 159: 171-80.

- 5.
Marchevsky AM, et al: The use of virtual microscopy for proficiency testing in gynecologic cytopathology: a feasibility study using ScanScope. Arch Pathol Lab Med. 2006, 130 (3): 349-55.

- 6.
Merk MR, Knuechel R, Perez-Bouza A: Web-based virtual microscopy at the RWTH Aachen University: Didactic concept, methods and analysis of acceptance by the students. Ann Anat.

- 7.
Schrader T, et al: The diagnostic path, a useful visualisation tool in virtual microscopy. Diagn Pathol. 2006, 1: 40-10.1186/1746-1596-1-40.

- 8.
Kayser K, et al: Digitized pathology: theory and experiences in automated tissue-based virtual diagnosis. Rom J Morphol Embryol. 2006, 47 (1): 21-8.

- 9.
Lundin M, et al: A European network for virtual microscopy--design, implementation and evaluation of performance. Virchows Arch. 2009, 454 (4): 421-9. 10.1007/s00428-009-0749-3.

- 10.
Bartels P, Weber J, L D: Machine learning in quantitative histopathology. Anal Quant Cytol Histol. 1988, 10: 299-306.

- 11.
Bartels PH, Vooijs GP: Vooijs: Automation of primary screening for cervical cancer. Sooner or later?. Acta Cytol. 43 (1): 7-12.

- 12.
Gabril MY, Yousef GM: Informatics for practicing anatomical pathologists: marking a new era in pathology practice. Mod Pathol. 23 (3): 349-58. 10.1038/modpathol.2009.190.

- 13.
Giansanti D: Virtual microscopy and digital cytology: state of the art. Ann Ist Super Sanita. 46 (2): 115-22.

- 14.
Kayser K: Application of structural pattern recognition in histopathology, in Syntactic and structural pattern recognition, T.P. Edited by: G. Ferraté, A. Sanfeliu, H. Bunke. 1988, Springer: Berlin Heidelberg New York, 115-135.

- 15.
Kayser K, et al: Application of attributed graphs in diagnostic pathology. Anal Quant Cytol Histol. 1996, 18 (4): 286-92.

- 16.
Kayser K, et al: AI (artificial intelligence) in histopathology--from image analysis to automated diagnosis. Folia Histochem Cytobiol. 2009, 47 (3): 355-61. 10.2478/v10042-009-0087-y.

- 17.
Prigogine I: Introduction to Thermodynamics of Irreversible Processes. 1961, New York: John Wiley & Sons Inc, 2nd

- 18.
Shannon C: A mathematical theory of communication. Bell Sys Tech J. 1948, 27: 379-423.

- 19.
Pincus S: Approximate entropy as a measure of system complexity. Proc Natl Acad Sci U S A. 1991, 88: 2297-2301. 10.1073/pnas.88.6.2297.

- 20.
Tsallis C: Entropic nonextensivity: a possible measure of complexity. Chaos, Solitons and Fractals. 2002, 13: 371-391. 10.1016/S0960-0779(01)00019-4.

- 21.
Tsekouras GA, Tsallis C: Generalized entropy arising from a distribution of q indices. Phys Rev E Stat Nonlin Soft Matter Phys. 71: 46-144.

- 22.
Kayser K, Kayser G, Metze K: The concept of structural entropy in tissue-based diagnosis. Anal Quant Cytol Histol. 2007, 29 (5): 296-308.

- 23.
Voß K: Statistische Theorie komplexer Systeme I. EIK. 1960, 3: 239-244.

- 24.
Stoyan D, Kendall WS, Mecke J: Stochastic Geomatry and its Pllications. 1987, Berlin: Akademie verlag

- 25.
O'Callaghan J: An alternative definition for neighborhood of a point. IEEE Trans Comput. 1975, 24: 1121-1125.

- 26.
Voronoi G: Nouvelles applications des paramêtres continus à la théorie des formes quadratiques, dexièmes memoire: recherches sur les parallèloedres primitifs. J Reine Angew Math. 1902, 134: 188-287.

- 27.
Zahn C: Graph-theoretical methods for detecting and describing graph clusters. IEEE Trans Comput. 1971, C-20: 68-86. 10.1109/T-C.1971.223083.

- 28.
Kayser K: Analytical Lung Pathology. 1992, Heidelberg, new York: Springer

- 29.
Kayser K, Kayser G: Virtual Microscopy and Automated Diagnosis., in Virtual Microscopy and Virtual Slides in Teaching, Diagnosis and Research., R.O. Edited by: J. Gu. 2005, Taylor & Francis: Boca Raton

- 30.
Kumar RK, et al: Virtual microscopy for learning and assessment in pathology. J Pathol. 2004, 204 (5): 613-8. 10.1002/path.1658.

- 31.
Yang L, et al: Virtual microscopy and grid-enabled decision support for large-scale analysis of imaged pathology specimens. IEEE Trans Inf Technol Biomed. 2009, 13 (4): 636-44. 10.1109/TITB.2009.2020159.

- 32.
Apfeldorfer C, et al: Object orientated automated image analysis: quantitative and qualitative estimation of inflammation in mouse lung. Diagnostic Pathology. 2008, 3 (Suppl 1): S16-10.1186/1746-1596-3-S1-S16.

- 33.
Oger M, et al: Automated region of interest retrieval and classification using spectral analysis. Diagnostic Pathology. 2008, 3 (Suppl 1): S17-10.1186/1746-1596-3-S1-S17.

- 34.
Gilbertson J, Yagi Y: Histology, imaging and new diagnostic work-flows in pathology. Diagnostic Pathology. 2008, 3 (Suppl 1): S14-10.1186/1746-1596-3-S1-S14.

## Author information

## Additional information

### Competing interests

The authors declare that they have no competing interests.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

#### Published

#### DOI

### Keywords

- Elementary Event
- Image Information
- Virtual Slide
- Histological Image
- Virtual Microscopy