Automated Selection of Hotspots (ASH): enhanced automated segmentation and adaptive step finding for Ki67 hotspot detection in adrenal cortical cancer

Background In prognosis and therapeutics of adrenal cortical carcinoma (ACC), the selection of the most active areas in proliferative rate (hotspots) within a slide and objective quantification of immunohistochemical Ki67 Labelling Index (LI) are of critical importance. In addition to intratumoral heterogeneity in proliferative rate i.e. levels of Ki67 expression within a given ACC, lack of uniformity and reproducibility in the method of quantification of Ki67 LI may confound an accurate assessment of Ki67 LI. Results We have implemented an open source toolset, Automated Selection of Hotspots (ASH), for automated hotspot detection and quantification of Ki67 LI. ASH utilizes NanoZoomer Digital Pathology Image (NDPI) splitter to convert the specific NDPI format digital slide scanned from the Hamamatsu instrument into a conventional tiff or jpeg format image for automated segmentation and adaptive step finding hotspots detection algorithm. Quantitative hotspot ranking is provided by the functionality from the open source application ImmunoRatio as part of the ASH protocol. The output is a ranked set of hotspots with concomitant quantitative values based on whole slide ranking. Conclusion We have implemented an open source automated detection quantitative ranking of hotspots to support histopathologists in selecting the ‘hottest’ hotspot areas in adrenocortical carcinoma. To provide wider community easy access to ASH we implemented a Galaxy virtual machine (VM) of ASH which is available from http://bioinformatics.erasmusmc.nl/wiki/Automated_Selection_of_Hotspots. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/13000_2014_216


Background
Adrenal cortical carcinoma (ACC) is a rare type of endocrine malignancy with an estimated incidence of 0.7-2.0 cases per million population per year and a poor overall prognosis [1]. According to recent evidence from the European Network for the Study of Adrenal Tumors (ENS@T) ACC study group, the resection status and the Ki67 labelling index (LI) in both localized and advanced ACC [2,3] constitute the most relevant prognostic parameters [4]. In this regard, it has been suggested that the histopathology report should include Ki67 LI along with confirmation of the adrenocortical origin on immunohistochemical grounds, Weiss score and resection status [4]. Importantly, Ki67 LI has been integrated in treatment flow charts for ACC patients with either tumor amenable to radical resection or advanced disease [4].
Taken together, the production of accurate and reproducible Ki67 LIs remains a key issue and main responsibility of pathologists. It should be recognized that various factors, such as pre-analytical, analytical, interpretation, scoring, and data analysis, might affect Ki67 LI [5]. Given the biological heterogeneity of Ki67 immunostaining across tumor specimens [5,6], the area of slide read has been controversial for Ki67 LI assessment e.g. in breast cancer [5,7]. According to the European Society of Neuroendocrine Tumors (ENETS), the mitotic count and the Ki67 LI should be assessed in areas with the highest proliferating activity (hotspots) in order to determine the proliferation grade in gastroenteropancreatic neuroendocrine tumors (GEP-NETs) [8]. As far as ACCs are concerned, there is not only lack of studies addressing the issues of a potential biological heterogeneity of Ki67 staining and interobserver variation, but also different methods of objective quantification of the Ki67 proliferative index.
In routine diagnostic practice, representative areas of slides are manually selected by histopathologists using visual examination of whole mount Ki67-immuostained slides at a low magnification. Of note, this process might lack reproducibility and affect the Ki67 LI [5]. Since digitized immunohistochemical (IHC) stained tissue sections have become amenable to the application of computerized image analyses, two independent groups have developed either a hybrid clustering approach for the detection of Ki67 hotspots in whole tumor slide images [9] or a simplified computerized method for hotspot detection in digitized IHC slides [10]. In this context, we developed Automated Selection of Hotspots (ASH) to provide clinical labs with the ability to determine the most active areas in proliferative rate within a slide and subsequently quantitate Ki67 LI using a desktop PC without requiring extensive bioinformatics support. ASH uses Galaxy [11] as a simple graphical user interface and to join the components of ASH into an analytical workflow for hotspot detection and this, Galaxy is contained in a VMware virtual machine (VM) [12] which ensures that the system is platform independent. The use of VM technology has been highlighted by Nocq et al. [13], to improve the usability of next generation sequencing software by simply sharing entire installations.
We believe that this is the first time that Galaxy-VM has been used to deliver single user (on a personal computer) or as a multi-user (on a server) hotspot detection software with the same easy access via the Galaxy graphical user interface (GUI).

Method
ASH is delivered as a virtual machine which consists of 3 classes: NDPI Segmentation, Adaptive Step Finding and Reporting Visualization ( Figure 1). NDPI Segmentation used previously described NDPI splitter [14] to split the input image files into A × B matrix followed by a step shift of 1 / 2 split image and quantitation Ki67 in all images using ImmunoRatio [15], implemented in ASH. This preprocessing step provides an initial quantitative ranking of all blocks from which the top 10 are used to focus in on the actual 'hotspot' fields. To ascertain the exact hotspot positions on the image we develop an Adaptive Step Finding class to adaptively determine the shifting step size, and trade-off between the hotspot detection resolution and system complexity. This Adaptive Step Finding class uses three of the same functions (Image shifting, ImmunoRatio and Ranking) that are used by NDPI Segmentation (Figure 1), however in this class eight blocks are created around the region selected by NDPI Segmentation (Figure 2). The rectangle area is shifted by a step size shrunk 50% every adaptive loop.
ASH provides an end to end workflow for hotspot detection using the functionality of a Galaxy GUI to provide the user with a simple data upload and html style reporting environment.

Implementation
The application is for digital images obtained on the Hamamatsu NanoZoomer Digital Pathology (NDP) System (Hamamatsu Photonics K.K. Japan), in their proprietary NDP Image (NDPI) file format. NDPI image segmentation using NDPI-splitter is available from [16]. Quantitation of segmented blocks with ImmunoRatio is available from [17]. For image processing, analysis, and visualization, we adopted OpenCV [18]. The ASH software tool is developed on the Ubuntu 12.04 [19] Linux operating system, as a Galaxy application [11] and is distributed as a VMware virtual machine [12] for a Windows user.
The detection of hotspots uses adaptive step finding methodology which has been utilized in engineering for many years [20] and extensively evaluated and validated [21]. Experimental evaluation has demonstrated the effectiveness of the adaptive step size [22] and the adaptive step finding method applied in ASH has the same functionality. The selection of the step size is critical both from the point of view of computational efficiency and detection performance.
To simplify the use of ASH, we have implemented a Galaxy within the same virtual machine (VM) to provide a standardized graphical user interface (GUI) for accessing, running and visualizing ASH. Galaxy is an open, webbased platform [23] and developed tools to upload image files, to analyse the files by ASH in batch mode and to deliver a html report of the selected image with the quantitative ranking of the hotspots displayed in that image. All components and dependencies were created into a VMware virtual machine (VM) [12] which is an environment that is used like any physical computer [24] but also shared by download. The entire virtual machine is usually contained in a few files on the host computer (the physical machine that the virtual machine is running on). This means that all the dependency's required by ASH, including NDPI splitter, ImmunoRatio, openCV and Galaxy, are replaced by just having VMware installed.

Automated selection of hotspots
The overall work flow for the image analysis outlined in Figure 3 includes the main classes developed for ASH which include NPI Segmentation, Adaptive Step Finding and Visual Reporting.

NDPI segmentation
In the first class, the NDPI Segmentation, the whole digital image scanned from Hamamatsu NanoZoomer is first divided with the NDPI splitter ( Figure 3). NDPI splitter processes the basic split of the image from a single (100 K × 100 K pixel) NDPI image into thousands of smaller (2 K × 1 K pixels) images known as image blocks.
Step shifting of ½ the size of an image block is performed to provide overlapping blocks, in order to scan more area and improve ImmunoRatio detection resolution. For the primary selection of hotspots, a ranked list of these image blocks is determined based on the quantitation, using ImmunoRatio, of each block ( Figure 4). " Step shifting" is illustrated in Figure 2 as well, while the black block moving from the yellow block indicates a 1 / 4 step shifting. ImmunoRatio provides quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki67 immunostained tissue sections [15]. In our software, the ImmunoRatio result is ranked and used to determine the hotspot areas.

ASH Virtual Machine
The whole scanned image is segmented with NDPI splitter, as shown from the left upper image to the right upper image in Figure 4.
Based on the split images, we shift them by 1 / 4 of the side length, as shown from the right upper image to the bottom image in Figure 4. After the successful creation of JPEG images from the NDPI files, we adopt Immu-noRatio to calculate the IR% per block of the image, and rank the top 10 IR% image blocks.

Adaptive step finding
In this part, a smaller step finding procedure is applied to the top 10 images, regions of interest, obtained from the previous segmentation, ImmunoRatio and ranking procedure. The initial iteration uses 1 / 2 of shifting step from last iteration followed by more sensitive steps, such as 1/4 step (Figure 2) to precisely select the appropriate region of interest. Subsequently, the averaged top 10 ratios of current iteration are compared to the previous top ten ratios. The Step Finding procedure stops when the slope of Immuno-Ratio to block number (as shown in Figure 5B) within a preset threshold of 0.01.

Visualization and reporting
In this part, we annotate the final top 10 regions in the original image and generate a report to list final top 10 ratios and their corresponding locations. Figure 6 shows an annotated image with Top 10 ImmunoRatio regions marked with red rectangles.

Optimization of adaptive step selection
To determine the effect of step size of the performance of ASH, we calculated the averaged ImmunoRatio as the step size was decreased from 1 / 2 to 1 / 32 ( Figure 5A). The averaged ImmunoRatio increases when step decreases from 17.07% to a maximum of 18.35% ( Figure 5 and Table 1). The step size and its corresponding ImmunoRatio, block number, and processing time are indicated in Table 1. Figure 2 shows an example with 1 / 4 step shifting and its 81 (9×9) blocks. The more blocks are calculated, the more chances to obtain the block with higher ImmunoRatio. Decreasing the step size from 1 / 2 to 1 / 32 requires a nonlinear increase in the number of blocks that must be calculated from 25 blocks up to 4225 blocks with an increase in average calculation time increase of >150 fold (i.e. from 25 seconds to about 1 hour per image) using a single core on an Intel Xeon X5650 CPU.

Validation of quantitative hotspot detection
Adaptive step finding has been utilized in engineering for many years [20] and extensively evaluated and validated by [21]. In [22], experimental evaluation demonstrates the effectiveness of the adaptive step size, while the adaptive step finding method applied in ASH had the same functionality. We have tested ASH in a set of >60 whole-slide digitally-scanned ACC images and in comparison with manual assessment labelling index assessment achieved a strong correlation (rho >0.8, p = 0) as calculated with Spearman rank order metric (publication in progress).

Discussion
There are many commercial image analysis products such as AQUA [25], Genie (Aperio) [26], TissueStudio (Definiens) [27], InForm (PerkinElmer) [28] which are capable of high quality image processing and Ki67 Part of the Image sample 1/2 step shifting of image block Basic split of the Image Figure 4 NDPI Segmentation: the image is segmented using followed by step shifting of these blocks by ½ their size prior to quantitation.
quantitation, which are cited in other studies and are not freely available for comparative testing. Whilst there are several open source image analysis tools (e.g. ImageJ [29], ImmunRatio [17]) and multiple custom built in house applications (e.g. Seedlink [9]) and our requirements included that the applications be open source and that it could provide hotspot detection and quantitative Ki67 scoring in a desktop application. Thus, we developed ASH, an open source, open access, application using Galaxy-VM technology, to support histopathologists in determining the most active areas in proliferative rate within a slide based on Ki67 LI staining. Additionally since ASH was developed in a Galaxy environment the currently segmentation and quantitation methods can be easily supplemented or replaced, in the central ASH application (by the authors) or by a user (in their local ASH instance), with improved methods developed by other research teams.
We implemented an overlapping block creation method, Step Shifting, since NDPI splitter is only capable of splitting an image and not generating overlapping blocks and to support our Adaptive Step Finding method which has been utilised in multiple engineering projects over many years [20][21][22].
When we shift the image block by different steps, we can see that the averaged ImmunoRatio increases when step decreases. Therefore, we developed an adaptive step finding technique to obtain the tradeoff between hotspot detection resolution and processing time. Whilst the accuracy of the ImmunoRatio % per image block improves there is an increased cost for calculation time. Optimal calculation time to accuracy ratio occurs at 1 / 16 Figure 5 The effect of (A) step size on ImmunoRatio % and (B) the blocks need to calculate these step sizes. The average value as determined by ImmunoRatio (red line).
with~1000 block based on the time to calculate one block is 1.0069 s based on a single core on an Intel Xeon X5650 processor. Seedlink, a hybrid clustering method [9], that provides the users with automatic identification of hotspots is comparable to ASH with respect to usability and output. Seedlink requires a post-processing step to determine true hotspots from the false positive hotspots to ensure accurate determination of Ki67 whilst ASH provide a ranked set of regions for from which the user can include or reject as part of the quantitation of Ki67. Thus ASH simplifies the decision making process by integrating the visualization of the detected hotspots with the quantitation of detected hotspots as a single output in the Galaxy-VM GUI.
Since different types of colored pollutions and colour interferences sometimes cause trouble to the hotspot detection, Adobe photoshop or an alternative program enabling pathologists to delete parts of the scanned image i.e. artifacts created during slide production, will improve the accuracy of the hotspot detection. Whilst we have tested ASH in a training set it is clear that there are 'inactive' areas apparently with 'low' Ki67 Labelling index. Hence it is more prudent to compare automated selected hot spot areas versus hot spot areas as selected by pathologists and further studies are warranted to confirm our findings in a lager cohort.
Galaxy provides the user with a simple GUI to apply ASH using only standard web browser (see background, reference Galaxy). Galaxy provides the remote access for ASH, so people can benefit from the higher processing speed and larger storage space than a local computer. To ensure that ASH is available to individual researchers and/or pathologists as well as those who are supported by a bioinformatics team, we have implemented this Galaxy as a VMware-VM. The combination of Galaxy in a VM provides a multi-user environment in which users can analyse their images in a password protected user specific space, but with the additional functionality of Galaxy and the capability to share any of the data, analysis and results. The current Galaxy-VM has been implemented to run using 1 CPUs, but can be scaled up by resetting the VM once installed to run more CPUs (see project website for help documentation).

Conclusions
We have developed ASH, an open source Galaxy virtual machine application designed for Ki67 LI hotspot detection support, aimed at both individual and large diagnostic laboratories who have little bioinformatics experience or support. ASH is designed to assist pathologists and  accelerate the time-consuming Ki67 hotspot selection procedure, enhance the detection resolution and eventually lead to improved reproducible Ki67 LI reporting. Prior to image processing, pathologists should initially exclude with an interface tool various artifacts, such as tissue folds, intrinsic/extrinsic pigmentation (deposit artifacts), necrotic areas, etc. ASH delivers a ranked list of hotspots as a combination of images and quantitative values for each hotspot detected, based on the Adaptive step finding algorithm [20][21][22] developed as part of ASH. The selection of the step size is critical both from the point of view of computational efficiency and detection performance and although we have successfully tested ASH in a training set of whole-slide digitally-scanned ACC images, further studies are warranted in to confirm its efficiency with a larger ACC set.