A rich internet application for remote visualization and collaborative annotation of digital slides in histology and cytology
© Marée et al; licensee BioMed Central Ltd. 2013
Published: 30 September 2013
Skip to main content
© Marée et al; licensee BioMed Central Ltd. 2013
Published: 30 September 2013
In the field of digital pathology and biomedical research, there is a strong need for efficient tools to build pathology atlases and to foster collaboration between researchers, pathologists (e.g. for inter-observer concordance studies) and computer scientists (e.g. for development and extensive validation of novel computer vision algorithms). Although many efforts have been made in virtual microscopy and telepathology in the recent years [1–4] , many of the resulting frameworks are not fully web-based therefore limiting collaboration, or they are vendor-dependant therefore limited in terms of supported image formats, or they use proprietary modules that prevent cross-browser compatibility and seamless execution on mobile devices, or they have restricted functionalities (e.g. images can only be annotated manually with image-level tags or fixed markers), or their design limits their application domain (e.g. education only, or disease-specific). In this paper, we present a general-purpose, rich internet application using recent web technologies and integrating various open-source tools, standards and generic algorithms for remote visualization and collaborative annotation of digital slides.
Our application follows a representational state transfer (REST) architecture style that structures database resources and that standardizes communication interfaces. In such a setting, each resource can be referenced by a uniform resource locator (URL) and they can be located at different physical sites and updated/deleted if necessary. By following these programming guidelines, we defined a RESTful JSON application programming interface (API) to allow communication between servers and clients.
On the server-side, our underlying data model allows to create multiple projects, where each project corresponds to a specific study or experiment. A project is described by a list of authenticated users with permission rights, a list of digital slide images, an ontology definition with domain-specific, user-defined, vocabulary terms, and annotations (regions of interest) associated to digital slides and drawn by users. All project data are stored in a spatial, relational database (PostgreSQL with PostGIS extension). The core of our application uses the Grails framework based on Spring, with Groovy dynamic programming language for Java, and Hibernate framework with its spatial extension for object/relational mapping.
On the client-side (i.e. the Web client), the source code is based on model-view-controller design patterns and it communicates directly through the API to visualize and edit resources. Data can also be retrieved or updated by third-party computer programs through the API.
We implemented image processing routines and a recent content-based image retrieval (CBIR) algorithm to speed up the exploration and annotation of digital slides. The image processing routines are based on ImageJ/FIJI plugins and include various image filtering operations (e.g. binarization, splitting color channels, and color deconvolution) that can be applied on-the-fly on image tiles to ease image inspection, and adaptive thresholding operations that can be used to semi-automatically draw annotation geometries around objects of interest. The CBIR algorithm uses random subwindow extraction and vectors of random tests on raw pixel values . It is used to search visually similar annotations and automatically suggests ontology terms through an average voting scheme based on computed image similarities with cropped images of previously indexed annotations. We implemented the CBIR algorithm using an efficient key-value store based on hash tables (using Kyoto Cabinet or Redis NoSQL databases).
Our application runs in any popular web browsers and on mobile devices without the need for proprietary browser add-ons. It has been used for one year by our collaborators from two geographically distant locations through the Internet. About one thousand whole-slide images of lung cancer studies (corresponding to roughly 1TBytes of data) acquired by two slide scanners (Olympus VS100 with 20X magnification and Hamamatsu Nanozoomer 2.0 with 40X magnification) have been uploaded. These include Hematoxylin&Eosin (H&E) stained histology images of experimental mice, and bronchoalveolar lavage (BAL) cytology images. Three ontologies describing various tissue types (e.g. bronchus, blood vessel, cartilage, adenocarcinoma, nodular lymphoid hyperplasia,. ..) and various cell types (e.g. squamous epithelial cells, macrophages, eosinophils, neutrophils, mucosecreting cells, ciliated bronchial cells,...) were defined and used by seven users (pathologists, pneumologists, and technicians) to annotate more than five thousand regions of interest.
Although the amount of data our software is already dealing with is rather large, it is expected that the wider adoption of digital acquisition equipments will generate much larger datasets. The design of our software allows its scaling to larger sets of images as most of the components (e.g. image servers, and image retrieval algorithm) can be distributed on multiple machines. It is also important to note that the architecture allows local configurations, ie. images and data have not to be stored on a central, external, server but they can remain on servers at local institutions, therefore ensuring confidentiality and local administration. It is worth noting that although we do not rely on latest standard definitions in digital pathology (), we plan to extend our software to support these standards once they will be implemented in the field. In the future, our architecture will also allow us to add new image formats without affecting the source code of the core application.
Regarding our preliminary evaluation with the CBIR algorithm, our results are promising for automatic term suggestion but further validation has to be conducted. Indeed, recognition rates for less frequent object types are lower, stressing the need for more manual annotations with respect to object types (e.g. mucosecreting cells were six times less frequently annotated than ciliated bronchial cells), and acquisition protocols (such as color stainings).
The proposed web software is generally applicable and its methodological choices open the door for large-scale distributed and collaborative image annotation and exploitation projects. Future work includes the integration and validation of general-purpose machine learning techniques to further facilitate annotation and quantification of specific visual phenotypes and to support their meta-analysis. We also plan to extend our framework to other types of multidimensional imaging data related to other diseases or biological processes, and we also intend to adapt its use for education purposes.
This work is funded by the research grant n°1017072 of the Walloon Region (DGO6). RM is also supported by the GIGA with the help of the Walloon Region and the European Regional Development Fund. The CMMI is supported by the European Regional Development Fund and the Walloon Region. XML is supported by the "Télévie" program of the "Fond National de la Recherche Scientifique" (FNRS). We thank Fabienne Perin, Christine Fink, Myriam Remmelink, and Sandrine Rorive for continuous software testing.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.