Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information
© Khushi et al.; licensee BioMed Central Ltd. 2013
Received: 20 November 2012
Accepted: 5 February 2013
Published: 12 February 2013
Virtual microscopy includes digitisation of histology slides and the use of computer technologies for complex investigation of diseases such as cancer. However, automated image analysis, or website publishing of such digital images, is hampered by their large file sizes.
We have developed two Java based open source tools: Snapshot Creator and NDPI-Splitter. Snapshot Creator converts a portion of a large digital slide into a desired quality JPEG image. The image is linked to the patient’s clinical and treatment information in a customised open source cancer data management software (Caisis) in use at the Australian Breast Cancer Tissue Bank (ABCTB) and then published on the ABCTB website (http://www.abctb.org.au) using Deep Zoom open source technology. Using the ABCTB online search engine, digital images can be searched by defining various criteria such as cancer type, or biomarkers expressed. NDPI-Splitter splits a large image file into smaller sections of TIFF images so that they can be easily analysed by image analysis software such as Metamorph or Matlab. NDPI-Splitter also has the capacity to filter out empty images.
Snapshot Creator and NDPI-Splitter are novel open source Java tools. They convert digital slides into files of smaller size for further processing. In conjunction with other open source tools such as Deep Zoom and Caisis, this suite of tools is used for the management and archiving of digital microscopy images, enabling digitised images to be explored and zoomed online. Our online image repository also has the capacity to be used as a teaching resource. These tools also enable large files to be sectioned for image analysis.
The virtual slide(s) for this article can be found here:http://www.diagnosticpathology.diagnomx.eu/vs/5330903258483934
Over the past decade there has been a marked increase in the use of virtual microscopy. Digital slides offer many benefits over traditional microscopy, such as ease of access, archiving, annotation and sharing. Automatic identification and percentage calculation of malignant/cancer regions of hundreds of archive slides have become possible by the use of data mining analysis tools[1–3]. Multiple digital slide images can be opened and analysed at the same time. For example Hematoxylin and eosin (H&E) and Periodic acid-Schiff (PAS) stained slides can be compared on the same screen, which is not possible in traditional microscopy. As unlimited users can examine specimens at the same time and this is independent of access time, many institutions have started teaching virtual microscopy as part of their regular histology course while others are considering moving in this direction[2, 4, 5].
The advantages of digitizing pathology slides are counterbalanced by the very large file sizes that are generated. A typical scanned slide at 400x magnification can be as large as 5 Giga bytes (0.25 μm/pixel) or even greater for higher resolutions. Such large file sizes hamper the downloading, viewing and analysis of digital slide images.
Proprietary image viewers such as Hamamatsu’s NDP.view or Aperio’s ImageScope only allow the user to take manual snapshots of the image being viewed, thereby limiting the maximum resolution to the resolution of the screen. For example, if the user’s display resolution is set at 1680x1050, then the maximum resolution of a snapshot would be 17.6 Mega pixels. This is insufficient for snapshots to be published as zoomable slides on the website, which require snapshots at a resolution of 45 Mega pixels or more. Similarly there is no tool that is able to split the digital slides scanned by the Hamamatsu NanoZoomer into smaller sections. Therefore, this study had two objectives; firstly, to enable the publishing of snapshots of virtual slides on a tissue bank website and the building of a searchable digital microscopy database, and secondly, producing smaller images from large virtual slides to enable easy analysis and handling by analysis software such as Metamorph® (Molecular Devices, USA) or MATLAB (MathWorks, U.S.A.). In order to achieve these objectives, we have developed two open source tools: ‘Snapshot Creator’ and ‘NDPI-Splitter’.
The tools are designed for digital images obtained on the Hamamatsu NanoZoomer Digital Pathology (NDP) System (Hamamatsu Photonics K.K. Japan), in their proprietary NDPI file format. NDPI-Splitter and Snapshot Creator are developed in Java using Standard Widget Toolkit (SWT), however it is Windows dependent because of the use of Hamamatsu SDK, available from Hamamatsu under their licensing agreement, for manipulating. NDPI files. It also uses JAI 1.1.3 available fromhttp://ndpi-splitter.googlecode.com/files/jai-1_1_3-lib-windows-i586-jre.exe and JAI Image IO 1.1 available from http://ndpi-splitter.googlecode.com/files/jai_imageio-1_1-lib-windows-i586-jre.exe. Apache Ant (http://ant.apache.org/) is used to build the projects.
NDPI Splitter is a Java Swing based graphical user interface (GUI) application. It uses the library classes described above to determine the size and dimensions of the image, then uses this information to calculate how to split up the image. It also includes a module to perform the filtering of “empty” images.
Snapshot Creator is a windows batch file with supporting Java classes. The Java classes use the library classes described above to determine size and dimensions of the image, and to extract the required size of the image. The batch file then uses the Deep Zoom converter tool to prepare the JPEG image for viewing in Deep Zoom.
To facilitate panning and zooming of images on our website, we have used the Microsoft Deep Zoom library. The JPEG converted images, produced by Snapshot Creator, are fed through the Deep Zoom converter tool to create tiles of images at various resolutions. Deep Zoomed images are then published on the website and linked to our customised version of the Caisis database.
Although Snapshot Creator and NDPI-Splitter share common libraries to handle the manipulation of proprietary NDPI files, they are used in different contexts.
Integration with Caisis
After the successful creation of JPEG images from the NDPI files, Snapshot Creator links the snapshots to the database using the filename of the image. The filename is an identifier of the slide in the ABCTB customised open source cancer-research database, Caisis, wherein histology slides are recorded as specimens. A stored procedure updates the matched field of the Specimens table. Implementation of Snapshot Creator in other SQL databases can be achieved, as this is configurable in the configuration file (snapshot-creator.properties), however, the SQL stored procedures have to be re-written to match the database structure of the prospective system. A full customized copy of Caisis, in use by the ABCTB, can be requested from the corresponding author. Source code of Caisis and our tools is available under GNU General Public Licence.
If an empty tile algorithm is selected, the tiles that are determined by the algorithm to contain no digital pathology information are placed in a sub-directory called “empty_tiles”. A log file called “log.txt” is created which records the calculation used to decide if a tile is empty or not. The log file can be used to fine-tune the thresholds for assigning ‘empty’ status. The “emptiness” algorithms, which are based on a determination of the numbers of pixels diverging from a threshold background value, are not perfect and the accuracy of the results varies for different images. For this reason the ‘empty’ tiles are retained for review, in case tiles that are not empty have been discarded. The application’s default values can be configured through a properties file named NDPIsplitter.properties.
Snapshot Creator and NDPI-Splitter are developed in Java and share common libraries to interact with the NDPI files. Technically they are very similar; however they are used in two very different contexts. Snapshot Creator is used to publish lower resolutions JPEG images on a tissue bank web search engine. On the other hand, NDPI-Splitter produces files that can be imported into sophisticated image analysis packages such as Metamorph. The current limitation of image analysis software is their dependency on computer specifications, and typically large images fail to be processed because of insufficient memory. In addition, the significant time required for extraction of TIFF images from large image files is a significant limitation in image analysis. Therefore the ability of NDPI-Splitter to split large files into smaller TIFF sections enables their import into and analysis by image analysis software.
Previous studies have reported different ways of automatic image analysis on virtual slides by identifying regions of interest[9–11]. For example, Romo et al. employed colour, intensity, orientation and texture to calculate a relevance score against a manually selected region of interest. However, by contrast, NDPI-Splitter does not identify regions of interest, rather it creates files that can be imported into automated image analysis pipelines. In addition, NDPI-Splitter, using intensity- and compression-based algorithms, can identify ‘empty’ regions that contain no or few pixels, which is a novel feature that streamlines the process of importing files for image analysis. This strategy reduces the requirement for manual review of tiles prior to image analysis and minimizes the input to downstream analysis, representing a significant time saving.
Snapshot Creator produces a snapshot, representing one quarter of the full slide image, for publishing on the website, allowing researchers to use the online image search engine and image viewer to determine rapidly whether the biobank holds the material they are interested in. If researchers are interested in applying to the bank for full scanned slides and related datasets, they can do so based on their rapid search of the online images, and full applications are assessed by peer-review.
Snapshot Creator takes the snapshot from the middle of the slide, in order to maximise the chance of including the cancer/malignant region in the snapshot, and in approximately 95% of our published images the malignant section is indeed present. However, for slides where the cancer region is markedly offset on the slide, the cancer region can be missed. In order to avoid this, all newly published images are manually reviewed. If the malignant region is not in the snapshot, a manual snapshot of the image is taken, and placed into the ‘JPEG Snapshot Processing’ folder (Figure3) for processing the next night.
Caisis is an open source cancer research database, with built-in fields for various cancers such as adrenal, bladder, colon, kidney, penile, prostate, testicular, breast, urological, pancreas and bladder, and the addition of more diseases or new fields is easily achievable. We have customised Caisis to link snapshots and virtual slides derived from the Snapshot Creator and NDPI-Splitter tools. Using Caisis, images can be searched for based on patient history, treatment or biomarkers, and relevant images can then be easily identified and sent to researchers. Therefore Snapshot Creator, NDPI-Splitter coupled with Deep Zoom and the customised Caisis database provide complete management of virtual images. Other researchers have also indicated the future development of such tools, therefore our open source tools provide the research community an alternate solution to in-house development.
In summary, as virtual microscopy is moving into the main stream of diagnostic pathology, teaching and research, the development of open source tools that manage, catalogue and process virtual slides are needed. A web search engine holding digitized images can be used in teaching environments, to illustrate normal and abnormal cell structures of different cancer type, such as invasive or in situ cancer, and is broadly available for research and clinical pathology review. Therefore, NDPI Splitter, Snapshot Creator, Caisis and Deep Zoom are open source tools that provide the ability to make greater use of digital images and therefore broaden the range of applications for tissue bank images.
Availability and requirements
Project name: NDPI-Splitter
Project home page:http://code.google.com/p/NDPI-Splitter/, a full customised copy of Caisis, in use by the ABCTB, can be requested from the corresponding author.
Operating system(s): Windows
Programming language: Java
Requirements: Java, Hamamatsu SDK, JAI 1.1.3, JAI Image IO 1.1, Ant, Deep Zoom
License: GNU GPL version 3 
Australian Breast Cancer Tissue Bank (ABCTB) is covered by Protocol No: X12-0279 Ethics Review Committee, Royal Prince Alfred Hospital, Camperdown, NSW 2050 Australia.
- Mikula S, Trotts I, Stone JM, Jones EG: Internet-enabled high-resolution brain mapping and virtual microscopy. NeuroImage. 2007, 35: 9-15. 10.1016/j.neuroimage.2006.11.053.PubMed CentralView ArticlePubMedGoogle Scholar
- Paulsen FP, Eichhorn M, Bräuer L: Virtual microscopy–the future of teaching histology in the medical curriculum?. Annals Anatomy Anatomischer Anzeiger. 2010, 192: 378-382. 10.1016/j.aanat.2010.09.008.View ArticleGoogle Scholar
- Rojo MG, García GB, Mateos CP, García JG, Vicente MC: Critical comparison of 31 commercially available digital slide systems in pathology. Int J Surg Pathol. 2006, 14: 285-305. 10.1177/1066896906292274.View ArticlePubMedGoogle Scholar
- Dee FR: Virtual microscopy in pathology education. Hum Pathol. 2009, 40: 1112-1121. 10.1016/j.humpath.2009.04.010.View ArticlePubMedGoogle Scholar
- Szymas J, Lundin M: Five years of experience teaching pathology to dental students using the WebMicroscope. Diagn Pathol. 2011, 6: S13-10.1186/1746-1596-6-S1-S13.PubMed CentralView ArticlePubMedGoogle Scholar
- Deep zoom features - microsoft silverlight. http://code.google.com/p/ndpi-splitter/Google Scholar
- Khushi M, Carpenter J, Balleine R, Clarke C: Development of a data entry auditing protocol and quality assurance for a tissue bank database. Cell Tissue Banking. 2012, 13: 9-13. 10.1007/s10561-011-9240-x.View ArticlePubMedGoogle Scholar
- The GNU general public license v3.0 - GNU project - free software foundation (FSF). http://www.gnu.org/licenses/gpl-3.0.htmlGoogle Scholar
- Jondet M, Agoli-Agbo R, Dehennin L: Automatic measurement of epithelium differentiation and classification of cervical intraneoplasia by computerized image analysis. Diagn Pathol. 2010, 5: 7-10.1186/1746-1596-5-7.PubMed CentralView ArticlePubMedGoogle Scholar
- Romo D, Romero E, Gonzalez F: Learning regions of interest from low level maps in virtual microscopy. Diagn Pathol. 2011, 6: S22-10.1186/1746-1596-6-S1-S22.PubMed CentralView ArticlePubMedGoogle Scholar
- Kayser K, Schultz H, Goldmann T, Gortler J, Kayser G, Vollmer E: Theory of sampling and its application in tissue based diagnosis. Diagn Pathol. 2009, 4: 6-10.1186/1746-1596-4-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Khushi M, Carpenter JE, Balleine RL, Clarke CL: Electronic biorepository application system: Web-based software to manage receipt, peer review, and approval of researcher applications to a biobank. Biopreservation Biobanking. 2012, 10: 37-44. 10.1089/bio.2011.0038.View ArticlePubMedGoogle Scholar
- Lien C-Y, Teng H-C, Chen D-J, Chu W-C, Hsiao C-H: A Web-based solution for viewing large-sized microscopic images. J Digital Imaging. 2009, 22: 275-285. 10.1007/s10278-008-9136-x.View ArticleGoogle Scholar
- Huss S, Fronhoffs F, Buttner R, Heukamp L: Web-based database for the management of tissue specimens in a transregional histological research facility. Diagn Pathol. 2011, 6: 17-10.1186/1746-1596-6-17.PubMed CentralView ArticlePubMedGoogle Scholar
- Kayser K: Introduction of virtual microscopy in routine surgical pathology - a hypothesis and personal view from Europe. Diagn Pathol. 2012, 7: 48-10.1186/1746-1596-7-48.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.