Due to the continuous increase on the incidence of breast cancer worldwide, especially at younger ages, more focus has been dedicated to the treatment and prognosis of this malignancy. Ki-67 is a well-established biomarker closely related to the development, metastasis, and prognosis of various tumors. In fact, Ki-67 is considered one of the most important protein markers to be evaluated in clinicopathological applications in breast cancer [1, 12]. So far, several researches reveal that Ki-67 automatic counting systems and individual platforms, such as Immuno Path and Immuno Ratio softwares, have been developed and further utilized in lung cancer, pancreatic cancer, lymphoma, breast cancer, and other tumors [23, 24]. Still, most of these systems could not meet the need of automation in clinical medicine, since the existing Ki-67 algorithms cannot automatically find the focused tissue regions, or automatically complete registration of IHC images and their HE images. Our work embraces the field of image recognition and registration, and applies a model of classification based on convolution network, using AI for the automatic identification of IDC regions and combining it with the traditional computer based Ki-67 positive algorithms. Therefore, this combination not only allowed the development of an effective method to extract the image ridge feature for Ki-67-stained IHC images and their HE images accurate registration automatically in breast IDC based on whole tissue sections, and obtained good results, but also developed a Ki-67 automatic counting software based on previous accurate image registration. Our results indicate that this new technological approach is feasible, efficient, and accurate for IHC images and their HE images registration and automatic scoring of Ki-67. What’s more, we provide those accurately labeled digital images of each positive and negative cells of ki-67 staining as an free-open public platform for researchers to assess the performance of computer algorithms for automated Ki-67 scoring on IHC stained slides.
WSI-based digital pathology has revealed immense advantages over traditional pathology diagnosis mode [3]. Several domestic and foreign pathology teaching and research departments have already used WSI for hardware conditions on daily pathological diagnosis and scientific research experiments [23,24,25]. The accurate and efficient labelling of the targeted WSI area is the key to digital pathology-related research [25]. In fact, the key first step of this study was to appropriately label the IDC regions in WSI images to provide computers with reliable and accurate data information learning. Through this study, we have explored a set of feasible programs and procedures for training labelling personnel based on WSI images, and, moreover, we have strengthened the role of pathologists in computer-aided diagnosis and analysis.
At present, the most commonly used evaluation method of registration effect is based on gray level, just like sum square differences (SSD), Normalized Mutual Information (NMI) and normalized cross correlation (NCC). In this paper, we choose NCC as our evaluation method of registration. It calculates the matching degree between two graphs by normalized correlation measurement formula. NCC evaluation algorithm can effectively reduce the impact of light on image comparison results, and the results of NCC evaluation algorithm are normalized to between 0 and 1, which is easy to quantify and judge the quality of registration results. The NCC value of our registration model is 0.975, this shows that the matching degree is very good and sufficient to meet the actual needs. In addition, automatic registration should produced some areas that do not match perfectly, for these areas, we had tried to manually adjust them to match perfectly. However, the test result found that the difference of the positive rate of IHC sections between manually adjusted and automatic results were very small. Our analysis suggested that was because the registration model had been able to make the WSIs highly matched, and slight regional differences in registration had little impact on the final result.
While performing slide screening and classification model training, it was necessary to continuously interpolate the verified experiments in order to improve the training efficiency and accuracy of the classification model. We found that a few non-standard pathological sections (such as IDC areas not appropriate for identification, and positive areas of unexpected dimensions) could reduce the accuracy of the classification model. The main reason appeared to be that the accuracy of the classifier was affected by differences in the individual characteristics of the image, possibly greater than the differences in the classification characteristics. For instance, when the number of patches extracted from a WSI was particularly large or small, the features learned by computer classification model may not represent the expected classification characteristics (such as IDC’s image characteristics) but, instead, they might be peculiar to the individual image that was evaluated (such as color differences and/or impurities of the present image). A potential alternative was prepared by selecting per WSI for training (2 k positive and 2 k negative patches were selected in our study), whereas the redundant patches were not included in the training set. Therefore, while selecting slides, we had to select proper types with obvious IDC area and moderate size, which would be more conducive to retrieve an accurate classification model. This revealed that a verification step was essential, and it required constant exchange of experience between the pathology team and the computer engineer team, as well as a close cooperation between these groups for troubleshooting purposes.
Internationally, automatic analysis with the aid of artificial intelligence has covered a variety of diseases, ranging from “benign” conditions such as diabetic retinopathy and Alzheimer’s disease [7], to malignant tumors such as breast cancer [26,27,28], lung cancer [29], liver cancer [30], skin cancer [31], osteosarcoma [32], and lymphoma [33, 34], with an accuracy rate of 89.4–97.8%, and an AUC score of 0.85–0.94 [7, 27, 31]. In addition, various AI systems related to breast cancer have penetrated through different levels of IDC, such as histology-assisted and cytology-assisted diagnosis, mitotic cell count, lymph node metastasis assessment [9, 10, 18, 22], breast cancer drug development and others [8], with an accuracy rate of 82.7–92.4% and an AUC score of 0.97 [27, 28]. This also indicated that, with the help of AI, pathological diagnosis and index counting was safe, effective, and feasible [35]. Notably, compared with our IDC identification system, accuracy levels followed the advanced international standards, and this model was a prerequisite to further match the IDC regions with corresponding Ki-67 staining, and to further develop a Ki-67 automatic counting system. However, as far as we know, there are very few such whole-slide-marked ki-67 standards which have accurately labelled each positive and negative cell of ki-67 staining image in public databases, and we will publish these digital Ki-67 images that have been accurately labelled each positive and negative cell by pathologists during the course of this study as an open public databases for other interested researchers.
Factors that lead to poor reproducibility of Ki-67 scoring results may include type of biopsy, time to fixative, type of antibody, method of reading and area of reading [36,37,38,39]. To decrease this variability and improve the evaluation of Ki-67, many research institutions including the International Ki-67 Working Group have conducted a series of studies [36,37,38, 40]. According to the guidelines for the analysis, reporting, and use of Ki-67 proposed by the International Ki-67 in Breast Cancer Working Group, Ki-67 score was defined as the percentage of invasive cancer cells positively stained in the examined region, while staining intensity is not relevant; For type of biopsy, both core-cut biopsies and whole section tissues are suitable, but whole section may give higher Ki-67 scores than core biopsy; For antibody clones, like MIB-1, MM-1, Ki-S5, SP6 and Ventana 30–9, most of the aforementioned studies have been demonstrated that the most widely used and validated antibody is the MIB-1 clone [36,37,38]. Although some factors like type of biopsy, antibody clones as mentioned above may be correctable, others may be difficult to standardize. The inconsistency in the selection of reading area of slide is generally considered to be one of the important reasons for the poor reproducibility of Ki-67 immunohistochemistry scoring. Due to the heterogeneity of breast cancer, most Ki-67 positive tumour cells are often unevenly distributed, and there are hot spots and cold areas [37, 41]. Many published studies showed that the Ki-67 score obtained by evaluating only the hotspot area or marginal area is significantly higher than the average area, cold area and intermediate proliferation area, and the Ki-67 score in the hotspot area had a greater correlation with breast cancer prognosis [37, 39, 42]. The International Ki-67 Working Group currently recommend that at least three high power fields (HPFs) should be selected to represent the spectrum of staining seen on the initial overview of the entire section, and the invasive edge of the tumour should be counted, and using the average score across the section for the present because of its greater reproducibility [36, 37, 39]. On the other hand, the number of cells counted is also one of the factors affecting the reproducibility of Ki-67 scoring in breast cancer. Obviously, the Ki-67 score obtained by counting 100 tumour cells must be different with 1000 tumour cells on the same immunohistochemistry section. Although there is currently no uniform requirement for the total number of cells in the Ki-67 scoring assessment, many research institutions including the International Ki-67 Working Group have recommend that at least 1000 cells should be scored and that 500 cells be accepted as the absolute minimum to achieve adequate precision [36, 39]. In our present study, Ki-67 was scored by the average method and more than 1000 cells on each Ki-67 slice were counted whether in manual counting or AI stage, which to achieve a harmonized methodology, create greater between-laboratory and between-study comparability of Ki-67 marker in breast cancer.