Comparison of histopathology to gene expression profiling for the diagnosis of metastatic cancer

Background Determining the primary site of metastatic cancer with confidence can be challenging. Pathologists commonly use a battery of immunohistochemical (IHC) stains to determine the primary site. Gene expression profiling (GEP) has found increasing use, particularly in the most difficult cases. In this pilot study, a direct comparison between GEP and IHC-guided methods was performed. Methods Ten archived formalin-fixed paraffin embedded metastatic tumor samples for which the primary site had been clinically determined were selected. Five pathologists who were blinded to the diagnosis were asked to determine the primary site using IHC and other stains selected from a panel of 84 stains. Each pathologist was provided patient sex, biopsy site and gross sample description only. Slides were digitized using ScanScope®XT at 0.25 μm/pixel. Each evaluating pathologist was allowed to provide a diagnosis in three stages: initial (after reviewing the H&E image), intermediate (after reviewing images from the first batch of stains) and final diagnosis (after the second batch of stains if requested). GEP was performed using the only FDA-cleared test for this intended use, the Pathwork Tissue of Origin Test. No sample information was provided for GEP testing except for patient sex. Results were reported as the tumor tissue type with the highest similarity score. Results In this feasibility study, GEP determined the correct primary site in 9 of the 10 cases (90%), compared to the IHC-guided method which determined the correct primary site for 32 of 50 case evaluations (average 64%, range 50% to 80%). The five pathologists directing the IHC-guided method ordered an average of 8.8 stains per case (range 1 to 18). GEP required an average of 3 slides per case (range 1 to 4). Conclusions Results of the pilot study suggest that GEP provides correct primary site identification in a higher percentage of metastatic cases than IHC-guided methods, and uses less tissue. A larger comparative effectiveness study using this study design is needed to confirm the results. Virtual slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1749854104745508


Background
Determining the primary site of metastatic cancer with confidence can be challenging; for 3-5% of cancer cases there is no clinically evident primary site [1][2][3][4][5][6][7]. Pathologists commonly use panels of immunohistochemical (IHC) and histochemical (HC) stains for diagnosis. A judicious use of lineage and organ-specific tissue markers is required to diagnose the primary site of metastatic cancer, as markers vary in levels of sensitivity and specificity. [6] In some cases, however, the primary site cannot be identified with certainty using conventional IHC evaluation. In a review and meta-analysis of published studies of IHC accuracy that were adequately blinded, four studies representing a total of 308 tumor samples reported average accuracy of 67% on metastatic samples [8].
In recent years, gene expression profiling (GEP) tests and tests using microRNA markers have been developed to aid in the diagnosis of difficult-to-diagnose tumors [9][10][11].
These tests use formalin-fixed paraffin-embedded (FFPE) tissue and either microarrays or real-time polymerase chain reaction technologies to measure levels of multiple markers followed by application of an algorithm to predict the most likely primary site for a particular sample.
Recent reports discuss the need to coordinate the use of available methods of identifying primary site to optimize patient care [12,13]. However, to date no direct comparison of the accuracy of the GEP and IHC-based methods has been published; more data regarding comparative effectiveness of the gene expression-and IHC-based approaches are needed to fully understand the appropriate use of these two methods.
One of the GEP tests for primary site identification is the Tissue of Origin Test (Pathwork Diagnostics, Redwood City, California) [14][15][16][17]. We conducted a pilot study using 10 cases to compare accuracy of IHC-based diagnoses of primary site with diagnoses rendered by the Tissue of Origin Test. We also compared the amount of tissue used by each method, as preserving tissue is critical if additional diagnostic analyses are needed.
A web interface was developed which linked to a digital microscopy database to facilitate and standardize immunohistochemical evaluation by evaluating pathologists. Histology slides were digitized and the images were provided to evaluating pathologists for histopathological evaluation. Using digital pathology and secure web access provided a controlled means to allow access to only the stains requested by each pathologist and maintained the blinding of the pathologist.  Figure 1 Project workflow. A summary of the project workflow is provided starting from the initial screening of the tissue specimen and the associated clinical information. Once qualified, the blinded specimens were processed for analysis by evaluating pathologists as well as GEP. The results from each approach were compared after processing was complete.
The methods being compared have inherent differences that can confound a comparison -GEP is objective, independent of clinical history, based on algorithmic applications to a dataset, and is a one-time test, whereas histopathologic evaluation is subjective, reliant on clinical history and often performed in consultation with other pathologists [18]. Stains are typically ordered in batches, and cost considerations may influence how many IHC stains a pathologist orders. We attempted to eliminate such confounding sources of variation. We placed no limit on the number of stains that could be ordered, and required individual pathologist assessments through a secure web interface that prevented access to evaluations being conducted by the other participating pathologists. Both GEP and histopathology were conducted on the same tissue block and the same restricted information of patient gender and minimal gross description of the specimen was made available for both methods.
This publication describes the novel methods used to conduct the study, results of the pilot project, and provides information needed for conducting a more comprehensive study.

Methods
Archived human formalin-fixed paraffin-embedded (FFPE) specimens (blocks) containing metastatic tumors with a clinically established primary site were used. Samples were coded and both the evaluating pathologists and the laboratory performing the GEP at Pathwork Diagnostics were blinded to the primary site. The study was conducted under an Institutional Review Boardapproved protocol. The overall study design and workflow is illustrated in Figure 1. Medical records from 2007 to 2011 were reviewed from two hospitals (The Regional Medical Center at Memphis, and Methodist University Hospital, Memphis, TN) to select cases with biopsy proven metastatic cancer chosen by the Principal Investigator to resemble cases on which GEP may be used in clinical practice, i.e. where the diagnosis was not always obvious upon morphology review.

Specimen inclusion criteria
The selection criteria were as follows. (i) The sample represented a FFPE metastatic tumor with a known primary site as determined by review of the medical records. The determination of primary site was based on all available clinical information, but was not accepted when the diagnosis was made exclusively using immunohistochemistry (IHC) or special stains. (ii) All tumor samples were selected from a panel of 15 tumor tissue types covered by the Tissue of Origin Test. These types include: bladder (BL), breast (BR), colorectal (CO), gastric (GA), testicular germ cell (GC), kidney (KI), hepatocellular (LI), non-small cell lung (LU), non-Hodgkin's lymphoma (LY), melanoma (ME), ovarian (OV), pancreas (PA), prostate (PR), thyroid (TH) and sarcoma (SC). (iii) The FFPE tissues submitted on unstained slides or paraffin blocks were checked for adequacy of tissue required for the study. Samples had to be sufficient to produce at least 25 5-μm-thick (+/− 1 μm) sections: (a) the first and last sections for staining with hematoxylin and eosin (H&E) stain, (b) eight unstained slides (USS) containing no less than 1 mm 2 of tumor tissue for GEP, and (c) at least 15 unstained slides for IHC staining and analysis. (iv) H&E stained slides were evaluated by a board certified pathologist to verify tumor content. All specimens were estimated to contain ≥ 60% non-necrotic tumor tissue (tumor and stroma) in the first and last H&E stained slide and (v) found to be consistent with the reported histology on quality review by a board-certified pathologist. If the number of stains ordered exceeded the number of slides that were cut initially, additional slides were prepared from the block, and the last slide was stained with H&E to verify that at least 60% tumor content remained.
Staining procedure and quality control All stains were performed in a CLIA-certified laboratory by a histotechnologist with more than 20 years of IHC staining experience. All IHC stains were performed on a Ventana Benchmark W LT automated immunohistochemical stainer. Relevant controls were included with each batch of stains. Prior to study initiation, a panel of 84 stains was agreed upon by all investigators, and made available for the study. This included 73 IHC stains and 11 histochemical stains (Table 1). Evaluating pathologists were free to order as many stains as desired from the list of stains in two rounds of requests.

Digitizing stained slides
All H&E slides and IHC stained slides were digitized using a Whole Slide Imaging (WSI) system (ScanScope W XT, Aperio W Technologies, Inc., San Diego, CA). All slides were scanned at 0.25 μm/pixel resolution, and the images saved in the password-protected database (Spectrum version 10.2.2.2314) provided by Aperio W on a web-accessible server. All digitized images were reviewed by a boardcertified pathologist for quality assurance.

Selection of evaluating pathologists
Five board-certified pathologists with a wide range of experience (3 to 30+ years post pathology training) and from a diverse set of institutions (academic centers, community practice, and pathology reference laboratory) were selected for evaluating the cases. All evaluating pathologists confirmed previous use of digital pathology.

Web interface
A web-accessible user interface was designed (OneTera LLC, San Francisco, CA) for evaluating pathologists to access the images in the Spectrum database, order stains as needed, and record diagnoses and associated confidence levels. This information was collected at the following points: a) after review of the H&E slide alone b) after review of the first batch of stains, and c) after review of the second batch of stains. Each evaluating pathologist (EP) had secure password-protected access to the interface. All 10 cases were evaluated by each EP. One H&E image was provided to all EPs as the starting point for each case. Each pathologist had access only to IHC stains that they had ordered; if a stain was ordered by more than one pathologist, requestors were provided with a copy of the same digital image. This added control and reduced variability in interpretation that might be attributed to the staining procedure or tumor heterogeneity.

Prediction of primary site using IHC and special stains
The EPs were blinded to the clinical history and tissue of origin for the samples. They received only the patient's sex and gross sample description for each case, including biopsy site, given in Table 2. Each EP first reviewed the H&E image, recorded an Initial diagnosis (Stage 1) with a level of confidence, and ordered the first round of stains from the panel in Table 1. The digitized images for these stains were provided after two working days. The EP reviewed the stain images, recorded an Intermediate diagnosis (Stage 2) with a level of confidence, and ordered the second and final round of stains from the panel in Table 1. As before, the digitized images for these stains were provided after two working days. The EP reviewed the stain images and recorded the Final diagnosis (Stage 3) with a level of confidence. The EP had the opportunity to provide a final diagnosis at any of the three stages: after the first review of the H&E image, after the first round of stains, or after the second round of stains. If the EP chose to deliver a final diagnosis at Stage 1 or Stage 2, the EP was prompted by the software to verify that this was intentional. Once the final diagnosis was provided, the EP received a system-generated e-mail with a link to the case. The EP had the option to alter the final diagnosis within 24 hours of receiving this email. After 24 hours, the case was considered closed for that EP, and no further change could be made. When two or more EP ordered the same stain, the same stain image was provided to the second ordering EP with the standard two working day delay, to ensure that no participant would be able to infer that the stain had been requested by another. The rationale for this was to ensure that the digital pathology and web interface did not provide indirect "cross talk" between EPs, i.e. to eliminate inadvertent clues on stains that were ordered by others which might allow an individual EP to infer the direction another EPs investigation was following.

Gene expression profiling test methodology
The specimens were processed for the Pathwork W Tissue of Origin Test at Pathwork Diagnostics Laboratory (PWDL) as described previously [14]. Unstained slides contained an identifiable tumor region that was at least 1 mm 2 in area. To increase the percent tumor in the submitted sample, tumor tissue was microdissected (scraped) from the slides and placed into vials for RNA extraction. Total RNA was isolated using the Agencourt W FormaPure Kit (Beckman-Coulter Genomics, Beverly, MA). The total RNA was processed to prepare labeled cDNA for hybridization to Pathchip W microarrays manufactured by Affymetrix (Santa Clara, CA) with a two-cycle amplification method using the RampUP Kit (Genisphere, Hatfield, PA). A positive/negative total RNA control was run with every amplification batch. The microarrays were washed and stained using the GeneChip W Hybridization Wash and Stain kit in a Gene-Chip Fluidics Station FS450Dx, and scanned with a Gen-eChip Scanner 3000Dx (Affymetrix). Microarray data files (CEL) that passed data verification [14] were analyzed using the Tissue of Origin Test algorithm, a 2000-  gene classification model which quantifies the similarity between RNA expression patterns of a study specimen and the 15 tissues on the test panel. Data were reported as Similarity Scores (SS) for each tissue, measures of the similarity of the RNA expression pattern of the specimen to the RNA expression pattern of the indicated tissue. Similarity Scores ranged from 0 (very low similarity) to 100 (very high similarity) and summed to 100 across all 15 tissues on the panel. The highest SS indicated the likely tissue of origin, with two exceptions: (i) If the highest score was less than 20, no tumor tissue type was predicted, and (ii) If the patient was male, and the highest SS was for ovarian cancer, and the second highest SS was for testicular germ cell cancer, the result was testicular germ cell cancer. For any tissue type with SS of ≤ 5, the possibility of that particular tissue type as the likely tissue of origin was ruled out. The Tissue of Origin Test result was automatically generated by the computer algorithm using only gene expression values as input.
No clinical history, reference diagnosis or biopsy site information was used.
It should be noted that in clinical practice, a PWDL pathologist provides an interpretation of the test results along with a confidence level. This is based on test performance information derived from analyses of results from the clinical validation study [14], as well as histopathologic appearance, and relevant clinical information. In this study, the primary analysis was conducted using the highest SS. A PWDL pathologist recorded an interpretation of the results while blinded to the primary site, and this information was available for secondary analyses.

Results
Ten metastatic specimens were selected from among common solid malignancies representing nine different primary sites, and four metastatic sites, as shown in Table 2. All specimens had at least 60% non-necrotic tumor content.
Five pathologists reviewed 10 cases each, for a total of 50 case reviews. The average number of stains ordered by an EP in the first round was 7.06. For 21 of 50 (42%) of the case reviews the EPs ordered a second round of stains with an average of 4.2 stains for each of these cases. From among the 442 total stains ordered by all the EPs for all the cases, an average of 22 unique stains per case was ordered by the group of five EPs. None of the EPs ordered histochemical stains other than IHC stains. The number of stains ordered for each case by individual EPs is shown in Table 3 and the twenty most commonly ordered stains are shown in Table 4. In decreasing order of frequency, the most commonly ordered stains were CK 7, CK 20, TTF-1, CDX-2, PAX-8, estrogen receptor, napsin A, PAX-2, p63, and synaptophysin. The average number of stains ordered by an evaluating pathologist (EP) per case was 8.8 (median 8, range 1 to 18).
The diagnoses of primary site reached by each EP and the predictions of primary site by the GEP test are shown in Table 5. Following review of H&E slides alone, EPs reached the correct diagnosis of primary site for 21 of 50 case reviews (42% accuracy). Following review of the first round of IHC stains, the EPs reached the correct diagnosis of primary site in 31 of 50 case reviews (62% accuracy). Following the review of all IHC stains (either one round for 29 cases or two rounds of IHC for 21 cases), the EPs reached the correct diagnosis of primary site in 32 of 50 case reviews (64% accuracy). Accuracy among EPs ranged from 50% to 80%. For each EP, the accuracy increased considerably between the H&E and first round, but not between the first and second round (Table 5). In 29 out of 50 case reviews, EPs provided a final diagnosis after ordering only the first batch of stains. The average accuracy for these diagnoses was 66% while for the 21 case reviews for which two rounds of stains were ordered, the average accuracy was 62%.
The variation in the total number of stains requested by all 5 EPs per case may be indicative of relative diagnostic complexity. Using this measure, case 3 was the simplest, and all EPs provided a final diagnosis without ordering a second round of stains. One EP diagnosed the case with as few as 3 stains. The average number of stains used by all EPs was 5 (median 4, range 3-7). Case 6 was the most complex: no EP required fewer than 8 stains, and the average across all EPs was 12.2 (median 11, range 8 to 18). For both cases, all EPs and the GEP test correctly determined the primary site.
The GEP test determined the correct diagnosis in 90% (9/10) of cases. The range of highest SS was 24.4 to 97.6 ( Table 5). The average number of slides used for the GEP test was three (median 2, range 2 to 6; Table 3). For cases 3 and 6, the numbers of slides used for the GEP test were 2 and 6, respectively.

Discussion
We have devised and tested a novel approach for directly comparing a GEP test to special stain evaluation using multiple evaluating pathologists. The study design reduced subjectivity of the pathologic diagnosis and minimized the variables between the two approaches being compared. The use of a single central laboratory to perform IHC staining and digital pathology to provide the results to multiple participating pathologists eliminated the need for shipping slides from one site to another for pathologist evaluation, thus enabling more efficient study logistics and timelines. The whole slide imaging system created a digital replica of the entire content of a glass microscope slide on the computer, closely emulating traditional viewing of a slide with a conventional microscope [19]. EPs could zoom in or pan out of the web-accessible, interactive images for evaluation. The system also allowed proper control, providing access to only the stains requested by each pathologist, maintaining sample blinding and preventing indirect "cross-talk" between EPs. All the EPs were given access to images after 48 hours of ordering stains, eliminating the possibility of indirect clues to EPs regarding stains ordered by other EPs. All the slides were stained in one CLIA certified laboratory by an experienced histotechnologist using the same IHC staining instrument, thus avoiding technical variability. All the slides were digitized using only one scanner and the images were checked for quality assurance by an experienced board certified pathologist, creating uniformity in the image quality. All the images were web-accessible using a password-protected database, allowing uniformity in Table 4 The numbers of EPs ordering each stain is shown by stain and case for the 20 stains most commonly ordered   evaluation to all the EPs. In this study, there was no restriction on the number of stains that could be ordered. The studies in the meta-analysis [8] restricted the number of markers or IHCs in a panel to between 4 to 10, and the number of tissues of origin represented were limited to 5 to 7. In clinical practice, GEP is used as an aid to diagnosis that the pathologist uses along with all the other histopathologic and clinical information to arrive at a diagnosis. The stringent study design that we created allowed direct comparison between GEP and histopathologic evaluation. All 5 EPs and the GEP test provided a final diagnosis on all 10 cases. Case 3 was of very low morphological difficulty and all EPs provided a final diagnosis that matched the reference diagnosis while ordering the smallest number of stains (total stains 25 among five EPs). Cases 4 and 10 were incorrectly diagnosed by all 5 EPs and warrant further discussion. The patient information provided to all the EPs and the GEP testing laboratory is shown in Table 2. For case 4, the primary site was correctly predicted by the GEP test but missed by all 5 EPs. EPs ordered a total of 61 stains; the most common being TTF1, CK7, CK20, CDX-2, PAX-2, ER and GCDFP-15. Case 4 has a somewhat confounding immunophenotype and an inconclusive appearance on H&E evaluation. In the GEP test clinical validation study [14], it was shown that there is a strong positive relationship between the Similarity Score and the probability that the TOO test prediction is correct. The highest SS generated by the GEP test was 97.6 (out of a possible 100), indicating a very high confidence in the prediction. For Case 10, neither the diagnoses reached by the EPs nor the TOO prediction matched the reference diagnosis of lung (squamous cell carcinoma). All EPs delivered a final diagnosis of bladder. Relevant IHC results for this case include positive CK7 and negative CK20, TTF-1, and uroplakin. These IHC results are fully consistent with the reference diagnosis. It is possible that the H&E appearance of this neoplasm (lack of keratinization and ribbon-like growth pattern) encouraged the EPs diagnosis of urothelial carcinoma. For Case 10, the highest SS generated was 24.4, indicating a relatively low confidence of prediction. Neither breast cancer (SS 22.9) nor non-small cell lung cancer (SS 20.8) have been excluded by the TOO test results, and would be considered possible primary sites. Bladder cancer has a very low score of 5.2, and while not formally ruled out (i.e. SS < 5), would be considered highly unlikely as a primary site.
The PWDL pathologist interpreting the GEP results favored a non-small cell lung origin, since a prediction of ovarian was implausible in males and the pattern of scores was most consistent with a non-small cell lung origin. The full value of GEP and other novel molecular evaluations in oncology will most likely be achieved by a judicious incorporation into the final pathologist consultation report [13].
The amount of tissue used by each testing method is an important consideration in evaluating tissue-based diagnostics. GEP used an average of three slides per case, whereas the EPs used an average of nine slides per EP per case. In this sample set, molecular testing required less tissue and reduced the risk of tissue depletion. For some cases (1, 4, 6, 7, 8 and 9) EPs ordered a large number of stains (average 11 slides per EP per case); whereas in cases 2, 3, 5 and 10 they ordered fewer IHC stains (average 6 slides per EP per case). The GEP test used 3 slides on average for both sets.
In this pilot study GEP performed favorably (90% accuracy) compared to histopathologic evaluation (average 64% accuracy) by 5 EPs. It is interesting to note that the average accuracy seen in this study is very similar to the average 67% accuracy reported in the previous meta-analysis [8]. In this study, there was no restriction on the number of IHCs, whereas the studies in the meta-analysis restricted the number of stains to between 4 and 10. While the sample numbers are small and do not support statistical analyses, this pilot study has established feasibility for a study with larger sample size that will be adequately powered to derive statistically significant conclusions regarding the comparative effectiveness of the two approaches.

Conclusions
This pilot study of 10 samples found important differences in accuracy between two methods. In the 10 metastatic samples reported here, GEP identified the correct primary site more often than the IHC-guided methods used by the pathologists participating in the study. This study design will be applied to a larger set of samples to provide a statistically powered assessment of the comparative effectiveness of the GEP and IHC-guided methods. Competing interests CRH, AK and AME were recipients of research grants from Pathwork Diagnostics for the performance of this study. WDH and RP are employees and stock holders of Pathwork Diagnostics.
Authors' contributions CRH was the Principal Investigator. CRH and AK participated in the conception, study design, sample identification, results interpretation, and correlation with clinical information and manuscript writing. AME participated in the sample identification, data management, results interpretation, and manuscript writing. RP and WDH participated in the conception, study design, conduct of the study, data management, results interpretation, and manuscript preparation. All authors read and approved the final manuscript.