The percentage of Ki67-positive tumor cells in each spot image (Ki67%) was in a very good agreement in the mD, dD, and cD sets when compared by single linear regression analysis (Figure 4). Regression of the cD from the dD (R2 = 0.92) reflects the impact of the expert editing the dD as well as accuracy of the DIA used to produce the dD. Regression of the cD from the mD (R2 = 0.94) reflects the consistency of the DIA-assisted standard criterion (cD) production with the manually obtained standard criterion. Slight bias in the opposite directions can be noted in both comparisons.
Besides the comparison of the summarized TMA spot indicators in the data sets, our method provides detailed information on accuracy of detection and the IHC-positivity interpretation by DIA of individual cells. Despite the very good agreement of the Ki67% per spot between the data sets, the accuracy of detection of individual tumor cells was much lower. Based on the expert corrections made on the dD to produce the cD, in average 18 and 219 marks per spot were edited due to the Genie and Nuclear algorithm errors, respectively (Figure 5); the mean of marks per spot was 663. In all 158 TMA spots, a total of 105,486 nuclei have been identified in the cD while 39,710 (37.6%) expert corrections have been made on the dD, including 2,941 (2.8%) and 24,727 (23.4%) edits to correct the Genie and Nuclear under-detection of epithelial nuclear profiles, respectively, and 10,793 (10.2%) over-detection of epithelial nuclear profiles by Genie or Nuclear. As described in the Methods section, the Nuclear component of the DIA was applied only on the epithelial mask already detected by the Genie component, therefore, the performance indicators of the Nuclear detection component are "Genie-dependent". When nucleus is properly identified, the IHC staining interpretation (Ki67-positive versus negative) by the Nuclear algorithm can be regarded as excellent: overall, only 1,035 (1%) false positive and 214 (0.2%) false negative tumor nuclei were corrected by the experts (Figure 6).
Importantly, our data highlight the impact of the accuracy definition and the DIA validation results: it was perfect with regard to the Ki67% result per image which could be interpreted as sufficient for clinical use. However, the accuracy of the individual tumour cell detection was less satisfactory and may be taken as a warning sign on the road of developing robust automated DIA tools. Perhaps, different applications may require more conservative accuracy definitions and validation procedures for specific DIA tasks.
Furthermore, our approach provided a benefit of "decomposing" the accuracy of 3 DIA components used by one expert editing procedure. Importantly, the Genie component outperformed the Nuclear algorithm in terms of tumor cell detection in our experiment. We are not aware of published data on the relative impact on DIA accuracy caused by automated tumour (epithelial) tissue and tumour cell identification components. While the issue of automatic detection and segmentation of cell nuclei in histopathology images is well-addressed, accuracy of tumour tissue detection would require targeted studies. Lastly, discrimination between the Ki67 IHC staining result (positive versus negative) by the DIA was excellent in our study.
We therefore suggest that even if the accuracy of the Ki67% (image-based estimate) on the whole spot series was very good, more conservative cell-based validation approach could uncover the "functional anatomy" of the DIA tools and point further DIA improvement efforts in the right direction to achieve most robust DIA processes and results.
The accuracy of the DIA tool used to produce the dD set determines the efficiency of the DIA-assisted cD production. In our case, the approach saved approximately 2/3 of manual editing, however, the effort to review the images remained the same. The efficiency can be further increased by improving ergonomics of the stereology tool and employing a better calibrated DIA-assistance. In our experiment, we applied systematic random sampling by stereology grid; it is a simple method to control the amount of manual work, however, it is subject to variable amount of tumour tissue and cellularity in the images. This disadvantage can be compensated by automated resizing of the stereology grid based on the results of the DIA to produce the dD set in order to get optimal number of cells to be reviewed per image.
Last but not least, the mD and/or cD sets are quality-assured and contain information on exact location of the cells in the image, therefore, can be utilized as standard criterion templates to validate, calibrate, and train DIA tools. On a more global perspective, the reference data libraries may serve as benchmark datasets for automated DIA - the demand well-recognized and addressed in computational neuroscience and bioimage informatics, in general, with tasks of much higher complexity than the digital IHC [14, 15].