Automated quantification of proliferation with automated hot-spot selection in phosphohistone H3/MART1 dual-stained stage I/II melanoma

Background Staging of melanoma includes quantification of a proliferation index, i.e., presumed melanocytic mitoses of H&E stains are counted manually in hot spots. Yet, its reproducibility and prognostic impact increases by immunohistochemical dual staining for phosphohistone H3 (PHH3) and MART1, which also may enable fully automated quantification by image analysis. To ensure manageable workloads and repeatable measurements in modern pathology, the study aimed to present an automated quantification of proliferation with automated hot-spot selection in PHH3/MART1-stained melanomas. Methods Formalin-fixed, paraffin-embedded tissue from 153 consecutive stage I/II melanoma patients was immunohistochemically dual-stained for PHH3 and MART1. Whole slide images were captured, and the number of PHH3/MART1-positive cells was manually and automatically counted in the global tumor area and in a manually and automatically selected hot spot, i.e., a fixed 1-mm2 square. Bland-Altman plots and hypothesis tests compared manual and automated procedures, and the Cox proportional hazards model established their prognostic impact. Results The mean difference between manual and automated global counts was 2.9 cells/mm2 (P = 0.0071) and 0.23 cells per hot spot (P = 0.96) for automated counts in manually and automatically selected hot spots. In 77 % of cases, manual and automated hot spots overlapped. Fully manual hot-spot counts yielded the highest prognostic performance with an adjusted hazard ratio of 5.5 (95 % CI, 1.3–24, P = 0.024) as opposed to 1.3 (95 % CI, 0.61–2.9, P = 0.47) for automated counts with automated hot spots. Conclusions The automated index and automated hot-spot selection were highly correlated to their manual counterpart, but altogether their prognostic impact was noticeably reduced. Because correct recognition of only one PHH3/MART1-positive cell seems important, extremely high sensitivity and specificity of the algorithm is required for prognostic purposes. Thus, automated analysis may still aid and improve the pathologists’ detection of mitoses in melanoma and possibly other malignancies.

In accordance with a similar study of nodular melanoma [8], we recently demonstrated that proliferation measured by PHH3/MART1 stains was a much stronger prognostic marker than both Ki67/MART1 and H&E stains in stage I/II melanoma. Moreover, quantification in hot spots was highly superior to quantification in the global tumor area for prognostic purposes [10]. Even though PHH3/MART1 eased the detection of tumor-cell proliferation substantially compared with H&E, manual counting and hunts for a single mitosis were still very cumbersome and time-consuming, and hot-spot selection somewhat subjective. To ensure a manageable workload and repeatable measurements in modern pathology, fully automated quantification including automated hot-spot selection by image analysis seems favorable.
To date, automated quantification of PHH3 single stains has only been reported for the global tumor area in a few malignancies, predominantly breast cancer [15,16], and very recently for manually selected hot spots in melanoma [11]. Surprisingly, we found no studies evaluating the agreement between manual and automated PHH3 counts, merely their association [17]. Algorithms for automated hot-spot selection are, so far, only reported for Ki67 single stains in adrenal cortical cancer [18], ovarian adenocarcinoma [19], and glioma [20]. Seemingly, all remain untested in context to their clinical purpose. Double stains may, nevertheless, increase accuracy of both automated PHH3 quantification and automated hot-spot selection; especially in melanoma that often contains a prominent lymphatic infiltrate with many proliferative lymphocytes. In addition, double stains enable calculation of various new index types, e.g., based on the exact area of melanocytic cells, rather than just the previously reported number per outlined area. Thus, cellular density that may vary considerably per unit area may more easily be incorporated.
This study aims to present the most optimal quantification of tumor-cell proliferation in PHH3/MART1stained melanomas. Both manual and fully automated assessments are utilized and compared in hot spots and global tumor areas.

Specimens
Formalin-fixed, paraffin-embedded tissue from 153 primary cutaneous melanomas from the Departments of Pathology at Aarhus University Hospital and Randers Hospital, Denmark, were included [10]. The patient characteristics of this previously described prospective patient cohort with consecutive stage I/II patients from 1997 to 2000 are presented in Table 1. The median Breslow thickness was 1.20 mm (range, 0.24-16 mm) and median age 51 years (range, 23-79 years). Recurrent disease accoured in 43 patients (28 %), in which, three were thin melanoma. The median follow-up time was 12 years (range, 8-14 years) for patients with event-free melanoma [10,21].
Prior to start, the Central Denmark Region Committee on Biomedical Research Ethics approved the study.

Manual quantification
The global dermal tumor area (Fig. 1a) was outlined manually in Visiopharm Integrator System 5.0.1.1122 (Visiopharm A/S, Hørsholm, Denmark). PHH3/MART1positive cells, defined as nuclei more brown than blue with noticeable MART1 surrounding (Fig. 1c), were counted manually at optical magnification 40X within the global outline and in a manually selected hot spot, i.e., a fixed 1-mm 2 square where epidermal regions and skin adnexa were omitted (Fig. 1g), if necessary. Previous comparison of one big 1-mm 2 hot spot versus four small adjoining 0.25-mm 2 hot spots within dermis (similar to the conventional method) revealed no distinct difference in their prognostic impact [10]. All manual procedures were performed without knowledge of patient outcome.

Automated quantification
In the Visiomorph DP module (Visiopharm), the brown, blue, and red staining colors were highlighted by color deconvolution, in which a standard deviation filter further enhanced the contours of PHH3-positive nuclei. In addition, the intensity band of the HSI (hue, saturation, intensity) color model highlighted PHH3 positivity. intensity, and nuclear irregularity. One particular step included dilation of PHH3; hence, cromatin of especially the anaphase and telophase was fused into one object. Blue nuclei of PHH3-negative melanocytic cells were lastly changed into the image class MART1 (Fig. 1d-f), because the reference space was defined as MART1-verified tumor area including nuclei. Based on the labels of image analysis, a hot spot was automatically detected using the processing step Object Heat Map, in which circles identify and cover the object of interest to produce a heat map according to their cluster (Fig. 1h). Accordingly, the center of a circle (radius, 200 μm) was positioned over every PHH3/MART1-positive cell. Within the circle, the value 1 was added, and outside the circle, 0 was added. The hot-spot center was then defined as the pixel position with most circle overlaps, which returns the highest summation or intensity value. This point was encompassed by a 1-mm 2 square ( Fig. 1h and i). Epidermal regions and skin adnexa were manually omitted, if necessary. Only the hottest hot spot was selected; however, more than one hot spot can, in theory, occur. Numerical outcome of such should, nevertheless, be identical or almost identical (minor regions may be subtracted from the 1-mm 2 reference area) when merely reporting the number per square.

Index calculations
MART1-adjusted indices were defined as the number of PHH3/MART1-positive cells divided by the MART1verified tumor area including nuclei within the outlined global tumor area ( Fig. 1d and e) and a hot spot. Hotspot indices were also reported as merely the number within the hot-spot square.

Statistical analyses
Stata 12.0 (StataCorp, College Station, TX, USA) performed analyses where two-sided P-values less than 0.050 were considered statistically significant.
Bland-Altman plots and Wilcoxon signed-rank tests compared manual and automated procedures. Paired t tests compared the location (x and y coordinates) of hot-spot centroids in image. In addition, the frequency of manual and automated hot spots that overlapped was established based on the distances between both the x and y coordinates of the square centers (see Fig. 2). The degree of overlap was divided into the categories: perfect overlap (distances, < 0.10 mm), mayor overlap (distances, 0.10-0.74 mm), minor overlap (distances, 0.75-1.0 mm), and no overlap (distances, > l.0 mm).
Univariate and multivariate analyses were performed using the Cox proportional hazards model. The group with low PHH3 was the reference group. The multivariate model included the index categorized by its median, the Breslow thickness (log-transformed), and ulceration.
Time to recurrent disease was calculated from the time of diagnosis, and patients who either were alive without recurrence at last clinical follow-up or died without evidence of melanoma were censored.

Results
Global manual and automated PHH3 counts are compared in Fig. 3a, and their ability to detect PHH3/ MART1 positivity is outlined in Table 2. The Spearman correlation coefficient was 0.77 (P < 0.0010).
The automated counts of the manually and automatically selected hot spots in cases with coinciding PHH3/ MART1 positivity (n = 92) are compared in Fig. 3b. The Spearman correlation coefficient was 0.86 (P < 0.0010).
The prognostic ability of the different PHH3 quantification types is displayed in Table 3 for univariate analysis and in Table 4 for multivariate analysis. The ability to detect recurrent disease is outlined in Table 5 for fully manual and fully automated hot spots.

Discussion
The principle finding of the study was that automated and manual results were very comparable for both counting cells and selecting hot spots. The prognostic impact of automated quantification in the clinically preferred hot spot was, however, noticeably reduced compared with manual quantification. Algorithms may, nevertheless, still serve as valuable tools for pathologists, and the hot-spot detection seems easily addable to other malignancies and stains.
Although automated indices, in general, were systematically higher than manual indices, global counts of PHH3/ MART1-positive cells were remarkably comparative in most cases (Fig. 3a, Table 2), and their prognostic impact were, accordingly, in fairly the same range (Tables 3 and 4). Only 33 cell counts (22 %) differed with more than five cells per mm 2 . They were primarily lesions with a very small MART1-verified tumor area (median, 0.39 mm 2 ); hence, the misinterpretation of a single cell may alter the calculated index substantially. This was the case for 12 lesions where absolute counts actually only differed with one to four cells. In the remainder of cases, the automated algorithm was unsuccessful because of overlaying PHH3positive lymphocytes (n = 7; Fig. 4a and b), heavy pigmentation (n = 6; Fig. 4c and d), pale MART1 (n = 5; Fig. 4e and f), or very dark MART1 that overlap pixel intensities of PHH3 (n = 3; Fig. 4g and h). Only a few other groups have presented an automated quantification of PHH3, but without our improved accuracy of immunohistochemical double staining [11,15,16,22,23]. Even fewer have compared their automated and manual cell counts [11,15,16], and apparently the actual agreement of methods remains unreported [17]. In a study of melanoma, the association between manual and automated counts was strong by Pearson's correlation, and their prognostic impact comparable; however, the difference between mean counts seems large (approx. 6/ mm 2 for manual counts vs. 16/mm 2 for automated counts) [11]. Our numbers were in the same range (7/ mm 2 vs. 10/mm 2 ), but the mean difference considerably smaller. Previous automation of Ki67/MART1 stains has demonstrated usefulness in both diagnosis and prognosis of melanoma [10,24,25], but fundamental differences between stains exclude extrapolation of results. The Ki67/ MART-positive cells are very abundant, which elevates cut-off point, and cells are mostly circular because they often are in the G 1 , S, or G 2 phases rather than mitosis. This eases image analysis because algorithms that enhance circular structures may be utilized [25].
Because the prognostic impact of hot spots seems highly superior to global analysis [10] (Tables 3 and 4), an automated selection was developed. The fully automated and fully manual selection was very similar, and hot spots overlapped in most cases (67 %). In cases without overlap (n = 30), manual and automated cell counts were often in the same range (only nine differed with more than five cells). After reevaluation of these nine cases, mistakes pertained to heavy pigmentation (n = 5), overlaying PHH3-positive lymphocytes (n = 2), pale MART1 (n = 1), and general failure of the algorithm's cluster circles to depict the correct hot spot (n = 1). After the manual and automated hot-spot selection, an automated count was performed for both hot-spot types. The automated algorithm selects the hot spot with cluster circles based on labels of the preceding automated image analysis. If correctly selected, the subsequent automated count of the automatically selected hot spot should always be higher or equal to the automated count of the manually selected hot spot. This was the case in 65 of 92 lesions (71 %; Fig. 3b). In the remainder cases, the cluster circles failed because another hotter hot spot actually existed. Yet, the difference between counts was very small (median, three cells per hot spot), and most hot spots actually still overlapped (n = 19). Nevertheless, few errors must be expected in automated hot-spot selection, which mostly pertains to the chosen radius of the cluster circles. Underlining their comparability, the prognostic performance of the manual and automated selection was very alike (Tables 3 and 4). No distinct parallel was observed between lesion size and hot-spot overlap, but hot spots were of course more likely to completely overlap in very small lesions. The mean MART1-verified tumor area was 2.69 mm 2 (range, 0.020-50 mm 2 ) for the entire cohort.
Because no previous study has developed or evaluated automated hot-spot selection in PHH3 stains or evaluated the currently presented algorithms for Ki67 stains [18][19][20] in a clinical setting, studies supporting our comparability of manual and automated selection are warranted.
It has previously been established that the global MART1-verified tumor area is preferable to simply using the area within the global outline, which includes surrounding stroma, as reference space in melanoma diagnosis (Fig. 1d vs. Fig. 1f ) [24], and this was also the case for prognostic purposes in this study (data not shown). This is possibly explained by lesional dependent amounts of stromal tissue that also may vary according to the subjective outline of the observer. Surprisingly, the MART1-adjusted hot spot of manual counts was quite inferior to just reporting the number of  PHH3/MART1-positive cells within the square (Tables 3  and 4); hence, cellular density seems of less importance in melanoma hot-spot analysis compared with global analysis. The result may, however, just reflect the fact that an overall division into PHH3-positive and PHH3-negative patients is more important for prognostic purposes. Despite our concordance between manual and automated indices and hot spots, the prognostic impact of automated quantification in a hot spot was noticeably reduced compared with the fully manual quantification (Tables 3 and 4). This is possibly because of the study's low cut-off point (one cell per square for fully manual hot spots), which divides the cases into either PHH3positive or PHH3-negative; hence, misinterpretation of only one cell is of grave importance in the cases with no vs one PHH3/MART1-positive cell (n = 68). High sensitivity and specificity of both manual and automated analysis is thus required. Because the manual count was gold standard, all lesions were very carefully examined by virtual microscopy, many repeatedly, and one could speculate whether some mitoses would have been left undetected in a routine setting with a heavy workload. In automated analysis, similar high performance is difficult to achieve because a few errors inevitable will occur (Fig. 4). Primarily, it seems difficult to prevent falsepositive counts of PHH3-positive non-melanocytic cells or brown pigment surrounded by MART1 and falsenegative counts of tumor cells that express no or only sparse MART1 (Fig. 4a-f). In addition, proper recognition of PHH3/MART1-positive cells seems especially difficult because the shape of PHH3-positive nuclei changes throughout mitosis, and MART1 has a tendency to become pale and sometimes almost white around the nuclei (Fig. 1c) for unknown reasons. Furthermore, the anaphase and telophase may also occasionally be overlooked or regarded as two rather than just one cell ( Fig. 4i and j), even though an algorithm may be designed to fuse the chromatin into one object. Inevitably, staining quality also changes despite standardization. The inability of algorithms to deal with variations in staining color and intensity between and within sections is, in general, a mayor drawback of digital image analysis. An algorithm that normalizes signals of immunohistochemical stains, e.g., by control sections, is thus highly needed. Other chromogenes than DAB or the use of virtual double stains could, in addition, be explored to enhance the results of automated image analysis in melanoma.
Still, all our lesions were analyzed with the same fixed protocol; hence, manual review with alteration of algorithms or rejection of result may possibly improve prognostic impact noticeably, and this is possibly a general necessity in the establishment of diagnostic digital pathology.
Though the prognostic impact of our fully automated quantification of PHH3 in hot spots diminished, algorithms may still serve as valuable tools for the pathologist. Adhering to the current staging manual where presence of  Given its high sensitivity, the labels or heat map of the global index quantification may vividly direct the observer's attention to cells that are most likely in mitosis, and the risk of overlooking that single mitotic cell may possibly be reduced. Furthermore, automated analysis with automated hot-spot selection may prove very useful in other malignancies or stains with higher clinical cut-off points than melanoma. For instance, in early breast cancer, a PHH3 cut off of 13 cells per 1.8 mm 2 has been proposed to distinguish high and low risk patients [15], and a Ki67 index above 30 % was recently acknowledged as an indicator for adjuvant chemotherapy [26]. In addition, neuroendocrine tumors are divided into well-or poorly differentiated by a Ki67 cut off of 20 % or a hematoxylin-eosin mitotic index of 20 mitoses per 2 mm 2 ; however, the suggested cut offs for PHH3 are as low as four cells per 2 mm 2 [27].

Conclusions
In conclusion, manual detection of PHH3/MART1-positivity in a 1-mm 2 hot spot yielded an outstanding prognostic performance in our study of stage I/II melanoma; however, adjusting the reference space for tumor cellularity and automation of quantification reduced its prognostic impact noticeably. The automated index quantification and automated hot-spot selection were, nevertheless, very comparable to their manual counterpart. The automated results may thus still aid and improve the pathologist's detection of mitoses in melanoma and possibly be useful in other malignancies and future research studies. Differentiation into merely PHH3/MART1-positive or PHH3/ MART1-negative cases was ideal for prognostic purposes in our cohort, but further studies are warranted to establish the optimal cut-off point for thin melanoma. c Brown pigment embedded in or in close connection to MART1. d False-positive counts consequently occur. e Very pale MART1 stain of melanocytic cells. f A PHH3-positive mitosis is thus overlooked. g Very dark MART1 stain or artifact that resembles pixel intensities of brown staining. h Because shape, in addition, resembles a nucleus, a falsepositive count is made. i Telophase of PHH3/MART1-positive cell. j Distance between chromatin of dividing nucleus is too large to fuse them into one object with the post-processing step dilation, and this mitosis is therefore overlooked