We have devised and tested a novel approach for directly comparing a GEP test to special stain evaluation using multiple evaluating pathologists. The study design reduced subjectivity of the pathologic diagnosis and minimized the variables between the two approaches being compared. The use of a single central laboratory to perform IHC staining and digital pathology to provide the results to multiple participating pathologists eliminated the need for shipping slides from one site to another for pathologist evaluation, thus enabling more efficient study logistics and timelines. The whole slide imaging system created a digital replica of the entire content of a glass microscope slide on the computer, closely emulating traditional viewing of a slide with a conventional microscope
. EPs could zoom in or pan out of the web-accessible, interactive images for evaluation. The system also allowed proper control, providing access to only the stains requested by each pathologist, maintaining sample blinding and preventing indirect “cross-talk” between EPs. All the EPs were given access to images after 48 hours of ordering stains, eliminating the possibility of indirect clues to EPs regarding stains ordered by other EPs. All the slides were stained in one CLIA certified laboratory by an experienced histotechnologist using the same IHC staining instrument, thus avoiding technical variability. All the slides were digitized using only one scanner and the images were checked for quality assurance by an experienced board certified pathologist, creating uniformity in the image quality. All the images were web-accessible using a password-protected database, allowing uniformity in evaluation to all the EPs. In this study, there was no restriction on the number of stains that could be ordered. The studies in the meta-analysis
 restricted the number of markers or IHCs in a panel to between 4 to 10, and the number of tissues of origin represented were limited to 5 to 7. In clinical practice, GEP is used as an aid to diagnosis that the pathologist uses along with all the other histopathologic and clinical information to arrive at a diagnosis. The stringent study design that we created allowed direct comparison between GEP and histopathologic evaluation.
All 5 EPs and the GEP test provided a final diagnosis on all 10 cases. Case 3 was of very low morphological difficulty and all EPs provided a final diagnosis that matched the reference diagnosis while ordering the smallest number of stains (total stains 25 among five EPs). Cases 4 and 10 were incorrectly diagnosed by all 5 EPs and warrant further discussion. The patient information provided to all the EPs and the GEP testing laboratory is shown in Table
2. For case 4, the primary site was correctly predicted by the GEP test but missed by all 5 EPs. EPs ordered a total of 61 stains; the most common being TTF1, CK7, CK20, CDX-2, PAX-2, ER and GCDFP-15. Case 4 has a somewhat confounding immunophenotype and an inconclusive appearance on H&E evaluation. In the GEP test clinical validation study
, it was shown that there is a strong positive relationship between the Similarity Score and the probability that the TOO test prediction is correct. The highest SS generated by the GEP test was 97.6 (out of a possible 100), indicating a very high confidence in the prediction. For Case 10, neither the diagnoses reached by the EPs nor the TOO prediction matched the reference diagnosis of lung (squamous cell carcinoma). All EPs delivered a final diagnosis of bladder. Relevant IHC results for this case include positive CK7 and negative CK20, TTF-1, and uroplakin. These IHC results are fully consistent with the reference diagnosis. It is possible that the H&E appearance of this neoplasm (lack of keratinization and ribbon-like growth pattern) encouraged the EPs diagnosis of urothelial carcinoma. For Case 10, the highest SS generated was 24.4, indicating a relatively low confidence of prediction. Neither breast cancer (SS 22.9) nor non-small cell lung cancer (SS 20.8) have been excluded by the TOO test results, and would be considered possible primary sites. Bladder cancer has a very low score of 5.2, and while not formally ruled out (i.e. SS < 5), would be considered highly unlikely as a primary site.
The PWDL pathologist interpreting the GEP results favored a non-small cell lung origin, since a prediction of ovarian was implausible in males and the pattern of scores was most consistent with a non-small cell lung origin. The full value of GEP and other novel molecular evaluations in oncology will most likely be achieved by a judicious incorporation into the final pathologist consultation report
The amount of tissue used by each testing method is an important consideration in evaluating tissue-based diagnostics. GEP used an average of three slides per case, whereas the EPs used an average of nine slides per EP per case. In this sample set, molecular testing required less tissue and reduced the risk of tissue depletion. For some cases (1, 4, 6, 7, 8 and 9) EPs ordered a large number of stains (average 11 slides per EP per case); whereas in cases 2, 3, 5 and 10 they ordered fewer IHC stains (average 6 slides per EP per case). The GEP test used 3 slides on average for both sets.
In this pilot study GEP performed favorably (90% accuracy) compared to histopathologic evaluation (average 64% accuracy) by 5 EPs. It is interesting to note that the average accuracy seen in this study is very similar to the average 67% accuracy reported in the previous meta-analysis
. In this study, there was no restriction on the number of IHCs, whereas the studies in the meta-analysis restricted the number of stains to between 4 and 10. While the sample numbers are small and do not support statistical analyses, this pilot study has established feasibility for a study with larger sample size that will be adequately powered to derive statistically significant conclusions regarding the comparative effectiveness of the two approaches.