Our experiment revealed a reliable performance of HER2 expression measurement by the IHC digital image analysis based on the membrane connectivity estimate. The algorithm was run "plug-and-play" on the TMA images without an attempt to calibrate for potential image variation caused by scanning or IHC procedures. Manual annotation of the tumour tissue was not performed; however, spots containing DCIS or insufficient amount of tumour tissue were excluded from digital analysis by visual evaluation. Under these conditions, the digital analysis was in almost perfect agreement with the pathologist's score (VE) and exceeded the latter in terms of detecting FISH-positive patients.
We tested the agreement between the visual and digital evaluations in two sets of analyses: it was almost perfect at the level of individual spot (kappa 0.86 and 0.87, with the VE1 and VE2 respectively) and at the patient level (kappa 0.80 and 0.86, with the VE1max and VE2max, respectively). In general, the level of agreement in our study was among the highest reported when compared to that of previous studies using various digital analysis platforms [5, 6, 9, 10, 14–17], but obviously some caution has to be taken when comparing across studies with different designs. In both VE and DA, we used maximum TMA spot values to define patient's HER2 IHC status. This approach has been tested previously , and, in our view, is a better way to summarize TMA data per patient than mean or median value, especially, when tissue heterogeneity is a concern. Also, maximum spot value increases the sensitivity of HER2 detection and may compensate for the limited tissue sampling in TMA.
As expected from the previous studies [6, 14, 15], both 0/1+ and 3+ IHC categories were consistently discriminated by both the VE and DA, whereas most discrepancies were present in detection of the 2+ score category. Although it sounds like a paradox, these discrepancies may bring the greatest "added value" of integrating digital analysis into the routine pathology work-up of HER2 testing. Extrapolation of our experiment to clinical setting would mean that in the cohort of 152 patients with early ductal carcinoma of the breast, HER2 IHC evaluated by one pathologist once (VE1max) would have revealed 8 patients with HER2 IHC 2+ with 8 reflex FISH tests performed. Including the DA would have resulted in additional 14 HER2 IHC 2+ cases followed by the obligatory 14 FISH tests, thus detecting another 3 HER2-amplified cases (Table 5, lines 21, 22, 29). If the decision to perform a reflex FISH test were based on the IHC 2+ score by either VE1max or DA, that would have resulted in 19 FISH-positive cases compared to 16 by the VE1max-based decision alone (leading to 19% increase of the number of HER2-amplified cases in the cohort). In the setting where the pathologist would evaluate the IHC twice (VE1max and VE2max), the second review would have resulted in additional 8 HER2 IHC 2+ cases followed by the obligatory 8 FISH tests, thus detecting 1 additional HER2-amplified case; inclusion of the DA results into the account would require another 8 FISH tests with another 2 HER2-amplified cases detected. Considering potential consequences of a misdiagnosed HER2 status in 2 or 3 patients in the cohort of 152 for the "price" of adding automated digital analysis step and roughly 5-8 additional FISH tests per misdiagnosed case, the "balance" seems to be on the positive side. On the other hand, addition of the DA would have "saved" 2 or 3 FISH tests (compared to VE2max and VE1max, respectively) by suggesting the IHC 3+ score instead of the pathologist's 2+ score (Table 5, lines #35-37), however, one of the cases (#35) was negative by FISH, revealing potential lack of specificity of the DA alone. In contrast to other studies [19, 20], our DA did not give a promise of a decreased number of IHC 2+ cases or increased specificity in detecting HER2-amplified cases. This latter statement, however, must be taken with caution since individual "sensitivity" of the pathologists may shift the VE results in different directions relative to the DA (the inter-observer variability was not tested in the present study). In summary, we suggest that the membrane connectivity DA would be most useful as a decision-support and quality assurance tool, alerting pathologists of borderline 0/1+ versus 2+ and 2+ versus 3+ HER2 IHC cases, thus improving the accuracy of the HER2 testing, but without expectation of significant savings by avoiding unnecessary FISH tests. Nevertheless, improved accuracy of the HER2 testing, without having to perform FISH in all cases, presents a reasonable economic trade-off. Although these considerations are based on the TMA analyses, whereas current pathology HER2 testing routine is based in the whole section samples, our data is at least representative and simulates the cases when limited tumour samples are available for testing.
The pathologist intra-observer agreement was slightly better than that with the digital analysis. However, the DA appeared to be more accurate in detection of FISH-positive patients. Interestingly, the second visual evaluation (VE2) was slightly more "sensitive" than VE1: it detected more 2+ patients and rescued 1 FISH-positive patient from the 0/1+ category by VE1. It is likely that this increase of sensitivity is a result of a learning curve - the pathologist adapting to evaluation of small samples of tissue in the TMAs as opposed to the IHC whole section slides used in routine pathology practice. This aspect may present additional benefit of the DA not only in the TMA analyses but also when a small tumour sample is available.
Objectivity of the digital analysis depends on numerous factors ; one particular factor is the accuracy of tumour tissue sampling for the analysis. If non-tumour tissue is included in the analysis, it may "dilute" the percentage of positive cells. In our experiment, no manual or automated annotation of the tumour tissue was performed, nevertheless, the DA recruited more 2+ and 3+ spots and patients than VE. Inevitably, our TMA spots contained variable proportions of tumour and non-tumour tissues and the digital analysis results could have been distorted without proper selection of the tumour tissue. However, since the membrane connectivity is a non-cell-based estimate and does not require distinction between tumour and non-tumour cells, the only prerequisite for the digital analysis was a sufficient amount but not proportion of tumour tissue in the ROI. This also provided the benefit of avoiding manual annotation of the ROI - the laborious and potentially biasing step of the image analysis.
With regard to detection of FISH-positive patients, the digital analysis provided maximum accuracy of IHC interpretation possible in our TMAs. As outlined in the Results section, the "false-positive" and "false-negative" cases by DAmax were also discrepant by VE1max and VE2max and most likely represented a true biological variation of HER2 gene amplification and expression and/or possible issues in tissue processing [21–26]. Although HER2 FISH status is commonly used as a "gold standard" in HER2 IHC studies, in a small proportion of cases it may remain discrepant due to tissue heterogeneity, CEP17 polysomy/amplification (if only HER2/CEP17 ratio is used to define the HER2 status), or other unrecognized causes of variation [27–30]. Our data reveal a subpopulation of patients where conventional HER2 FISH positivity criteria based on HER2/CEP17 ratio may be not sufficient and support the need to further explore the biological continuum of HER2 positivity and clinical relevance of the test [30–33]. Although analysis of this complexity is beyond the scope of the present study, it is important to note that the membrane connectivity estimate represents a continuous variable of HER2 expression by IHC and can serve better than categorical IHC score in statistical analyses exploring the relationships of HER2 expression and amplification. In support of this perspective, we found significant correlations of the IHC membrane connectivity with the FISH results: HER2 copy number (r = 0.67), HER2/CEP17 ratio (r = 0.57), and mean CEP17 number per cell (r = 0.39), similar to the recent report of Vranek et al  (although the correlation to CEP17 did not reach statistical significance in this study of patients with the CEP17 polysomy). Of note, automation and further quantification of the FISH testing, with increase of accuracy and capacity of the test, seems to be an important step to further progress.