Skip to main content

PD-L1 diagnostic tests: a systematic literature review of scoring algorithms and test-validation metrics



The programmed death receptor 1 (PD-1) protein is a cell-surface receptor on certain lymphocytes that, with its ligand programmed death ligand 1 (PD-L1), helps to down-regulate immune responses. Many cancer types express PD-L1 and evade immune recognition via the PD-1/PD-L1 interaction. Precision therapies targeting the PD-1/PD-L1 pathway have the potential to improve response and thereby offer a novel treatment avenue to some patients with cancer. However, this new therapeutic approach requires reliable methods for identifying patients whose cancers are particularly likely to respond. Therefore, we conducted a systematic literature review assessing evidence on test validation and scoring algorithms for PD-L1 immunohistochemistry (IHC) tests that might be used to select potentially responsive patients with bladder/urothelial cell, lung, gastric, or ovarian cancers for immunotherapy treatment.

Methods and results

To identify evidence on commercially available PD-L1 IHC assays, we systematically searched MEDLINE and Embase for relevant studies published between January 2010 and September 2016 and appraised abstracts from recent oncology conferences (January 2013 to November 2016). Publications that met the predefined inclusion criteria were extracted and key trends summarized.

In total, 26 eligible primary studies were identified, all of which reported on the test validation metrics associated with PD-L1 IHC tests in lung cancer, most using immunohistochemistry testing. There was significant heterogeneity among the available tests for PD-L1. Specifically, no definitive cutoff for PD-L1 positivity was identifiable, with more than one threshold being reported for most antibodies. Studies also differed as to whether they evaluated tumor cells only or tumor cells and tumor-infiltrating immune cells. However, all of the tests developed and validated to support a therapeutic drug in the context of phase 2–3 clinical trials reported more than 90% inter-reader concordance. In contrast, other PD-L1 antibodies identified in the literature reported poorer concordance.


Published validation metric data for PD-L1 tests are mainly focused on immunohistochemistry tests from studies in lung cancer. The variability in test cutoffs and standards for PD-L1 testing suggests that there is presently no standardized approach. This current variability may have implications for the uptake of precision treatments.


Checkpoint inhibitor therapy is a recent development in the field of cancer immunotherapy and precision medicine, and involves targeting immune pathways that enhance the body’s ability to recognize and destroy tumor cells (TCs). One key mediator in such pathways is the programmed death receptor 1 (PD-1) protein, a cell-surface receptor on certain lymphocytes. The interaction between PD-1 and its ligand, programmed death ligand 1 (PD-L1), plays a crucial regulatory role in the human immune system by inhibiting the body’s immune response to foreign antigens. However, many cancer cell types express PD-L1 and thereby activate PD-1/PD-L1 signaling, thus enabling these tumors to evade immune recognition. Precision therapies that focus on the PD-1/PD-L1 pathway can offer a novel treatment avenue to some patients with cancer. Five PD-1/PD-L1 immunotherapies (atezolizumab, avelumab, durvalumab, nivolumab and pembrolizumab) have now been approved by the United States (US) Food and Drug Administration (FDA) and/or European Medicines Agency (EMA) for a variety of indications following the publication of clinical trials demonstrating their efficacy improving therapeutic response.

Although research into the effectiveness of these types of immunotherapy is rapidly evolving, there remains some uncertainty regarding the extent to which measuring levels of PD-L1 expression in individuals’ tumor tissue helps to identify patients who are most likely to respond to treatment. For example, in Hodgkin’s lymphoma, most tumors have been reported to express PD-L1, so assessing expression in patients can contribute only minimally to clinical decision-making about suitability for treatment [1]. However, for a specific group of cancers (e.g., non-small cell lung cancer), evidence suggests that responsiveness to PD-1 inhibitors such as pembrolizumab and nivolumab or to the anti-PD-L1 antibodies atezolizumab and durvalumab may be predicted by expression of PD-L1 on TCs and/or tumor-infiltrating immune cells (ICs) [1]. Therefore, tests detecting PD-L1 expression may play an important role in the use and development of anti PD-1/PD-L1 agents aimed at these tumor types, which include bladder/urothelial cell, lung, gastric, and ovarian cancer.

Currently there are a range of commercially available PD-L1 IHC tests. Tests are typically designated by the antibody clone that is used to detect the presence of the PD-L1 protein; for example, the 22C3 test developed by Dako (PD-L1 IHC 22C3 pharmDx, Agilent Pathology Solutions) uses a monoclonal mouse anti–PD-L1 clone, 22C3. Some of the available tests have been developed and validated as part of clinical trials that were used to demonstrate the efficacy of the aforementioned licensed PD-1/PD-L1 immunotherapy medicines. Tests of this type can be further sub-divided into two types: companion diagnostics, which (per the US Food and Drug Administration (FDA) definition), provide information, often obtained in vitro, that is “essential for the safe and effective use of a corresponding drug or biologic product” [2], and complementary (or co-diagnostic) tests that may be used in treatment selection, but are not considered essential for safe and effective use of the corresponding therapy in practice. A key distinction between companion and complementary diagnostics is that, whereas companion diagnostics are tied to a specific drug within its approved label, complementary or co-diagnostics may be associated with particular drugs but are not included in the licensing indications for those drugs. Of note, IHC-22C3 for pembrolizumab is currently the only FDA-approved companion diagnostic for PD-1/PD-L1 targeted immunotherapies. Furthermore, although pembrolizumab is now licensed for multiple indications, the FDA only recommends IHC-22C3 for treatment selection for the following specific groups: patients with previously untreated metastatic non-squamous non-small cell lung cancer (NSCLC) whose tumors express PD-L1 at a level of 50% of higher (or second line NSCLC patients with ≥1% expression) and patients with recurrent locally advanced or metastatic, gastric or gastroesophageal junction adenocarcinoma who have Combined Positive Score (CPS) (a measure based on the number of PD-L1 stained tumor cells, lymphocytes, macrophages) of ≥1. Other tests such as IHC 28–8, SP142, and SP263 for nivolumab, atezolizumab and durvalumab respectively, are regarded as complementary diagnostics and are not considered by the FDA as being essential for safe and effective treatment selection.

The landscape of available potential PD-L1 diagnostic tests is further complicated by the fact that each test has its own antibody detection system and tests are performed using different platforms. As a result, the extent to which particular tests are either interchangeable across different indications or superior in terms of accuracy can be important to both uptake of PD-1/PD-L1 targeted therapies and use of these tests for patient management decisions. To provide insights into this area and to help identify and address potential knowledge gaps, a systematic literature review (SLR) was conducted to provide insights into the characteristics of different tests and to examine the validity of commercially available PD-1/PD-L1 tests in assessing bladder/urothelial cell, lung, gastric, and ovarian cancers.


This review explored the characteristics of commercially available PD-L1 tests currently in use for bladder/urothelial cell, lung, gastric, and ovarian cancers, by addressing the following specific research questions:

  • What types of tests, platforms, and scoring algorithms are currently being used?

  • How has the validity of these tests, platforms, and scoring algorithms been tested?


The SLR was conducted in accordance with the methods outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Systematic searches were conducted in MEDLINE® (via PubMed) and Embase® (via for studies published in English between January 1, 2010 and September 15, 2016. Medical Subject Headings (MeSH), EMTREE terms, and free-text terms were used and combined, where appropriate, with Boolean operators (“AND”, “OR,” and “NOT”). Key search terms included text variations on biomarkers of interest, such as “programmed death-ligand,” “PDL1,” “PD-L1,” and relevant validation metrics, such as “Sensitivity and Specificity” (MeSH) and “valid*.” (The MEDLINE search strategy is provided in a supplementary appendix.) Two searches were run; the second supplementary search used the same core algorithm but with some additional terms (for example “correlat*” and “immunohistochemistry” [MeSH]) to ensure the search was comprehensive.

Supplementary searches were undertaken to capture ‘grey’ literature—data from sources not indexed in the electronic databases. To capture such evidence, proceedings from the three most recent meetings of the following six subject-specific conferences were searched:

  1. 1.

    American Society of Clinical Oncology (ASCO)

  2. 2.

    European Society for Medical Oncology (ESMO)

  3. 3.

    Society for Immunotherapy of Cancer (SITC)

  4. 4.

    International Cancer Immunotherapy Conference

  5. 5.

    American Association of Cancer Research (AACR)

  6. 6.

    International Association for the Study of Lung Cancer (IASLC)

Study selection was based on criteria that were defined a priori and are summarized in Table 1. The titles and abstracts of records retrieved via the literature searches were first appraised by a single reviewer, and 10% of the screening decisions made at this level were checked by second reviewer to confirm their accuracy, as a quality control measure. Relevant studies that passed this first round of screening then underwent full-text screening, which was conducted by two reviewers to confirm each inclusion and exclusion decision. Any discrepancies at the abstract and full-text level were resolved in discussion with a third reviewer where necessary.

Table 1 Criteria for Study Selection

Data abstraction of the included studies was performed using a predefined data abstraction template designed in Microsoft Excel®. For each included study, data were captured by a single investigator, with validation of the accuracy and completeness of this abstraction being performed by a second reviewer. Any discrepancies were resolved in a discussion with a third investigator. Specific key information was abstracted from included studies on the following: patient population, type of test, test developer, test platform, test-scoring algorithms, test thresholds/cutoffs, and test-validation metrics. Due to the variety of study designs considered in this review, it was not possible to undertake a risk-of-bias assessment using a single standardized tool. Heterogeneity in the studies also meant that a quantitative meta-analysis of their data was not appropriate; therefore, the evidence abstracted from included studies was qualitatively synthesized and key trends were summarized.


Search results

The indexed database searches yielded 950 records. After removing publications duplicated between databases, 589 abstracts remained and were screened, of which 57 met the criteria for detailed review of their associated full-text publications. Of these 57 publications subjected to full-text screening, 12 were eligible for inclusion in the SLR, as they reported on PD-L1 test validation metrics for commercially available tests. An additional eight studies were identified from the supplementary search and 10 conference abstracts also met the eligibility criteria. Therefore, a total of 30 references (collectively representing 26 unique study populations and four linked publications) were included in the review. The study screening and selection process is illustrated in Fig. 1.

Fig. 1
figure 1

Screening and Study Selection

All 26 included studies reported on test validation metrics associated with PD-L1 tests in lung cancer. One of the studies also reported data relating to bladder/urothelial cell cancer [3]. No evidence relating to gastric or ovarian cancer was identified.

Lung cancer

Types of PD-L1 antibody tests identified in the SLR

Across the 26 included studies, eight antibodies for detecting PD-L1 expression in patients with lung cancer were identified, as follows:

  • PD-L1 IHC 22C3 pharmDx by Dako (referred to hereafter by the antibody 22C3): 3 studies [4,5,6]

  • PD-L1 IHC 28–8 pharmDx by Dako (referred to hereafter by the antibody 28–8): 7 studies [6,7,8,9,10,11,12]

  • VENTANA PD-L1 (SP263) Rabbit Monoclonal Primary Antibody by Roche (referred to hereafter by the antibody SP263): 6 studies [6,7,8, 13,14,15]

  • VENTANA PD-L1 (SP142) Assay by Roche (referred to hereafter by the antibody SP142): 9 studies [3, 6, 8, 9, 16,17,18,19,20]

  • PD-L1 (E1L3N®) XP® Rabbit mAb #13684 by Cell Signaling Technology [CST] (a reagent provider): 9 studies [8, 11, 15, 20,21,22,23,24,25]

  • 4059 by ProSci, Inc.: 1 study [26]

  • h5H1 by Advanced Cell Diagnostics: 1 study [27]

  • 9A11 (developer not reported): 1 study [8]

In all cases, PD-L1 expression was evaluated using an immunohistochemistry (IHC) platform. One of the studies specified that diaminobenzidine tetrahydrochloride was used as the reagent to produce the “brown staining” for the IHC process [8]. Three studies evaluated results derived from alternative test platforms as well as IHC. Two studies [8, 20] measured PD-L1 expression using quantitative fluorescence (QIF) and another study looked at fluorescence in-situ hybridization (FISH) [12].

The antibodies manufactured by Dako and Roche had all been originally developed and validated to support a therapeutic drug in the context of a clinical trial. These antibodies were evaluated in eight studies as follows:

  • Three studies looked at the IHC-SP142 (Roche), developed alongside atezolizumab [3, 16, 17]

  • Two studies looked at IHC-SP263 (Roche), developed alongside durvalumab [14, 28]

  • Two studies looked at IHC-22C3 (Dako), developed alongside pembrolizumab [4, 5]

  • One study looked at IHC-28-8 (Dako), developed alongside nivolumab [10]

Test-scoring algorithms and thresholds used among the PD-L1 tests

The thresholds and scoring systems used to determine PD-L1 positivity varied between the antibodies and across studies. Eleven studies [4, 7, 10,11,12, 14, 19,20,21, 23, 28] investigated dichotomous cutoffs (representing the proportion of cells with PD-L1 expressed) for PD-L1 positivity using different antibodies (the thresholds used in these studies are summarized in Table 2). Amongst these 11 studies, nine [4, 6, 9,10,11, 19, 20, 22, 27] set thresholds a priori (for example, based on cutoffs used in previously published research) and two studies [4, 13] attempted to establish an optimal threshold based on the study findings. In one study [18], it was unclear whether the thresholds used had been specified prospectively or retrospectively.

Table 2 Dichotomous Scoring Used Across Antibodies for PD-L1 IHC Tests in Lung Cancer

A further 11 studies [5, 9, 13, 15,16,17, 22, 24,25,26,27] used a hybrid score that combined components of staining intensity with the percentage of positive cells to determine PD-L1 positivity. One study evaluated two tests, SP142 (Roche) and E1L3N (CST; reagent provider), by means of a QIF process that used an automated scoring system. In this system, the QIF score of PD-L1 signal for each antibody in the tumor and stroma was calculated by dividing the target PD-L1 pixel intensities by cytokeratin and DAPI positivity [20].

A second study [8] that incorporated QIF did not provide details on the scoring approach. Another study [12] investigated FISH and evaluation criteria included CD274, PDCDILG2-CEB 9 ratio, gene copy numbers, proportions of TCs with ≥4 PDL1/2 and ≥5 PDL1/2 signals, and gene clusters. Yet another study [6] validated a six-step scoring system that integrated all of the cutoff criteria from four tests that have been used in clinical trials: 28–8 and 22C3 (both Dako) and SP142 and SP263 (both Roche).

Types of cells tested for PD-L1 expression

There was variation among the studies with regard to the cell type tested, specifically, whether PD-L1 expression was measured on TCs and/or tumor-infiltrating ICs. Nine studies tested TCs only [4, 5, 7, 10,11,12, 21, 26, 27], two tested both TCs and tumor stroma [20, 29], 14 studies evaluated both TCs and ICs [3, 6, 8, 9, 13,14,15,16,17, 19, 22, 24, 25, 28], and in one study it was unclear which type of cell had been tested [23]. TCs were more frequently evaluated than tumor-infiltrating ICs or tumor stroma, regardless of whether dichotomous or hybrid scoring algorithms were used.

Test validation metrics

Individual test performance

Most studies (18/26) focused on a single antibody and reported validation metrics that were specific to the one test under investigation, without comparing its performance with that of another antibody or testing approach. The results of these studies by outcome are summarized below and in Table 3. Among the tests developed in a clinical trial setting to accompany a therapeutic product, the validation metrics were similar and all the tests had a greater than 90% inter-observer concordance [10]. In comparison, E1L3N, a test developed outside of clinical-trial settings [i.e., not specifically for a particular PD-1/PD-L1–targeted therapy], reportedly had slightly lower inter-observer concordance metrics [21,22,23], namely below the 84–88% concordance level at the 1% cutoff [21]. In the studies that reported intra-observer and inter−/intra-site concordance, high agreement (above 90%) was observed for all these metrics across the tests developed in a clinical trial setting to accompany a therapeutic product, except for inter-site concordance for SP263 (Roche; durvalumab), which was 86.4% [14] and for 22C3 (Dako; pembrolizumab) 88.3% [5].

Table 3 Individual Test Performance: Test-Concordance Metrics

Two studies reported on the extent of agreement in test results when different types of samples (biopsy or surgical-resection) were tested, and these found some conflicting results. One study looked at the use of the SP142 test (Roche) in biopsy and surgical-resection samples. It reported an overall discordance rate of 48% (95% confidence interval, 4.64%–13.24%) and a κ score of 0.218, indicating poor agreement between the test outputs from the different sample types [13]. The study authors commented also that in all cases, the biopsy specimens underestimated the PD-L1 status relative to the expression level in the whole tumor (further data not provided in the study report). Another study found overall concordance between biopsy and surgical-resection samples ranged from 82.5% (κ = 0.3969) (i.e., fair agreement), at a score of hybrid score of 51 (range, 0–170) or greater, to 92.4% (κ = 0.8366) (i.e., high agreement), at a score of 1 or greater [26].

Head-to-head test performance

Seven studies reported data relating to the comparative performance of two or more tests, and their key findings are summarized in Table 4. Among these studies, three reported on the overall test concordance between two or more antibodies. The first found acceptable agreement between two tests developed in a clinical trial setting to accompany a therapeutic product, 28–8 (Dako; nivolumab) and SP263 (Roche; durvalumab), for which the overall test concordance was 90.3%. The remaining two studies found mixed results when a clinical trial test developed to support a therapeutic product was compared with E1L3N, which was not developed or validated as part of a clinical trial. Of these studies, one observed poor concordance when SP142 (Roche, atezolizumab) was compared with the antibody E1L3N (CST; reagent provider, not developed or validated as part of a clinical trial) (κ concordance at 1% cutoff = 0.340, 5% cutoff = 0.286, and 50% = 0.189) [20]. The other study reported moderate agreement between 28 and 8 (Dako, nivolumab) and E1L3N (75.0% and 86.2% at 5% and 50% cutoffs, respectively) [11].

Table 4 Head-to-Head Test Performance: Test-Validation Metrics

Three of the head-to-head comparison studies [6, 13, 15] reported on differences between TC and IC staining patterns between antibodies, and they found mixed results: in some cases, SP142 stained fewer TCs but more ICs, whereas SP263 stained more TCs than ICs [6]. A further study [13] found good overall concordance between the SP142 and SP263 (both Roche) antibodies on TCs (κ = 0.412) but poor agreement between these antibodies on ICs (κ = 0.018). This study also reported poor agreement between SP142 and 28–8 antibodies [13] on TCs (κ = 0.412) and ICs (κ = 0.134), whereas good concordance was observed between the SP263 and 28–8 antibodies on both TCs (ρ = 0.996, κ = 0.883) and ICs (κ = 0.721). Another study [15] compared SP263 (Roche) with E1L3N (CST; reagent provider) and found that inter-pathologist correlation for membrane-tumor staining was similar between the antibodies (SP263 R2 > 0.87 vs E1L3N R2 > 0.82), while staining for ICs was lower with SP263 (R2 > 0.66) than with E1L3N (R2 > 0.80).

Harmonization of scoring algorithms across antibodies

One study reported on inter-observer concordance based upon a six-step scoring system which integrated the criteria employed by the four different clinical trial tests (28–8 and 22C3 [both Dako], SP142 and SP263 [both Roche]) and found moderate agreement using this harmonized approach (κ = 0.47 to 0.49) [6]. The study also reported good concordance coefficients (κ = 0.59 to 0.80) when using integrated dichotomous proportion cutoffs across the antibodies (≥ 1%, ≥ 5%, ≥ 10%, ≥ 50%); however, proportion scoring of PD-L1–positive IC yielded lower inter-observer concordance coefficients both for the six-step score (κ < 0.2) and the dichotomous cutoffs (κ = 0.12 to 0.25), concluding that unified PD-L1 IHC scoring criteria for TCs may be feasible, whereas scoring for ICs requires detailed training [6].

Bladder cancer

One study reported on the test-validation performance of a PD-L1 test in bladder/urothelial cell cancer for the antibody SP142 (Roche) and found it had acceptable inter-reader concordance between pathologists (> 90%) when measuring PD-L1 expression in both IC and TC in bladder/urothelial cell cancer [3].


The results of this SLR demonstrate that there are varied cutoff and scoring algorithm approaches among the commercially available PD-L1 antibody tests in lung cancer. There is, for example, no commonly accepted standard or threshold for determining positivity for each of the antibodies based on the proportion of PD-L1–positive cells. Further differences between scoring algorithms relate to the way in which staining patterns are interpreted; some studies have investigated the use of proportional scoring [4, 7, 10,11,12, 14, 19,20,21, 23, 28] for the respective antibodies, whereas other studies have looked at hybrid test-scoring methods that also take into account staining intensity [5, 13, 15,16,17,18, 22, 24,25,26,27].

In general, our review found that the concordance between tests developed in a clinical trial setting to accompany a therapeutic product was deemed acceptable, with inter-reader concordance exceeding 90% [7]. This finding is mirrored in recently published data from phase 1 of the Blueprint Project, which explored the analytical and clinical comparability of four PD-L1 IHC tests used in clinical trials (Dako 22C3, Dako 28–8, Roche SP142, and Roche SP263) and found comparable results across the tests when applied to assess TC staining in NSCLC, although the test SP142 resulted in fewer stained TCs overall (phase 2 of this project is now underway and will seek to validate these findings and also provide data on a fifth assay developed by Dako that uses the antibody 73–10). Our SLR did, however, find conflicting evidence concerning concordance when different antibodies developed in a clinical trial setting to accompany a therapeutic product were compared with those developed outside this type of setting, such as E1L3N [11, 15, 20].

Our findings are in line with other reviews in this topic area (which were performed non-systematically), which have also reported on the variations in cutoffs used for different antibodies to determine PD-L1 positivity [30,31,32]. In particular, our research did not identify a definitive threshold result that can be universally applied to predict clinical response to PD-L1–targeted precision treatments, which has been noted previously by Festino et al. [30]. There were also differences among the studies included in our review in terms of the types of cells that were tested for PD-L1 expression (i.e., TCs only, or TCs and ICs), with some studies [13, 15] also noting differences in staining patterns and concordance depending on whether biopsy and surgical resection samples were tested. Two recent review articles have also reported that cell type can play a key role in determining test outcomes. Specifically, these publications have indicated that ICs express significantly higher levels of PD-L1 than TCs (e.g., Ma et al. [31] and Festino et al. [30]) and that the expression by TCs is sometimes more heterogeneous compared with that of ICs. It has also been theorized that different cell phenotypes/characteristics may also contribute to this variability in PD-L1 expression across cancer cells [32].

One limitation of our review is that of the existing commercially marketed tests considered, most were IHC tests, with only three studies reporting on QIF [8, 20] and FISH [12]. We did not, for example, find any data on multimarker or next-generation tests that identify PD-L1 expression. In addition, only limited evidence was found on PD-L1 tests in bladder/urothelial cell cancer, and there were no validation studies for commercially available tests in gastric or ovarian cancers.

The heterogeneity in the findings of this review has important implications for clinical practice. Notably, the lack of standard thresholds for responder identification and concordance between a subset of tests indicates the existence of (1) potential risks for efficient treatment selection and use of precision therapies; (2) confusion about whether it is important to request a particular PD-L1 test; and (3) potential adverse effects on patient management decisions (e.g., if the test thresholds used in clinical practice do not correspond with those used in the clinical trials in which particular IHC clones were developed and validated, and in which treatment efficacy was demonstrated, the patient may be inaccurately identified as a potential therapy recipient). However, it is also important to note that no study from our search results reported evidence for these possibilities. Ambiguity around test thresholds, decision algorithms, and interchangeability of PD-1/PD-L1 testing could also present uncertainty for those payers who view accurate prediction of the subpopulation of treatment responders as being a key value of precision therapy approaches. Where there is variability in the interpretation or selection of particular tests, there is the potential for physician confusion, interpretation dilemmas, and payer uncertainty.

There are illustrative examples of such difficulties from previous attempts to introduce biomarker testing to the selection of precision therapy and patient management. In the case of IHC and molecular testing for epidermal growth factor receptors, for instance, the substantial variability in test cutoffs or thresholds and the potential for variable interpretation of early-generation tests have been well documented. Following early introduction of tests for this marker and initial launch of EGFR-targeted agents, some health technology assessment and payer organizations (notably, large commercial health plans in the United States and the Canadian Agency for Drugs and Technologies in Health [33] in Canada) had concerns around interpretation and selection of some EGFR tests, arguing that the connection between test results and patient management or treatment selection was insufficiently clear. Another example occurred in the years immediately following the launch of trastuzumab, when there was significant controversy among physicians over the selection of HER2 IHC vs. FISH testing that led, in some cases, to slower uptake of the associated precision medicines. When clinical practice guidelines were updated to indicate that IHC testing should be conducted initially, with a subset of these patients receiving receiving FISH testing for confirmation, this clarified the appropriate clinical testing pathway for prescribing trastuzumab [34]. These instances of uncertainty about how companion diagnostic tests should be interpreted and used had implications for access to precision treatments in some markets, and/or influenced uptake and use of these medicines and their associated tests [34,35,36].

Conducting additional studies and increasing both interpretation and education about test cutoffs would help to better inform the use of PD-1/PD-L1 diagnostics and ensure more consistent clinical assessment and application of the class of PD-1/PD-L1 inhibitors [31]. In addition, the available literature suggests that greater understanding is needed on the interchangeability of these PD-L1 tests for predicting response to anti-PD-L1 and anti-PD-1 targeted therapies. Such evidence would be crucial for supporting decision-making in a context where multiple PD-L1 tests are available (which seem to have variable validity in inter/intra-observer and inter/intra-site concordance) and where findings are not always consistent or reproducible across tests.


Most validation-metric data available for PD-L1 tests relate to the use of IHC tests in the context of lung cancer, and this evidence raises some key challenges that may influence the uptake of PD-L1 testing. In particular, standardization among available PD-L1 IHC tests is currently lacking (with regard to antibodies used, cutoffs/thresholds for a given antibody, and differences in scoring algorithm and test sites) and there is limited information on the extent, if any, to which the tests might be interchangeable. Developing strategies to address this variability in available IHC tests and publishing data that clarify the value of non–IHC-based approaches, such as FISH and next-generation tests that incorporate PD-L1, will be important to address as the availability of precision treatments focused on these biomarkers continues to increase.



Cell Signaling Technology


Tumor-infiltrating immune cell




Programmed death receptor 1


Programmed death ligand 1


Quantitative fluorescence


Systematic literature review


Tumor cell


  1. Cree IA, Booton R, Cane P, Gosney J, Ibrahim M, Kerr K, et al. PD-L1 testing for lung cancer in the UK: recognizing the challenges for implementation. Histopathology. 2016;69(2):177–86.

    Article  PubMed  Google Scholar 

  2. US Food & Drug Administration (FDA). Companion Diagnostics. Accessed 30 March 2017.

  3. Boyd ZS, Smith D, Baker B, Vennapusa B, Koeppen H, Kowanetz M, et al. Development of a PD-L1 companion diagnostic IHC assay (SP142) for atezolizumab. In: The inaugural international cancer immunotherapy conference; September 16–19, 2015; New York, NY [abstract B001].

  4. Garon EB, Rizvi NA, Hui R, Leighl N, Balmanoukian AS, Eder JP, et al. Pembrolizumab for the treatment of non-small-cell lung cancer. N Engl J Med. 2015;372(21):2018–28.

    Article  PubMed  Google Scholar 

  5. Roach C, Zhang N, Corigliano E, Jansson M, Toland G, Ponto G, et al. Development of a companion diagnostic PD-L1 immunohistochemistry assay for pembrolizumab therapy in non-small-cell lung cancer. Appl Immunohistochem Mol Morphol. 2016;24(6):392–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Scheel AH, Dietel M, Heukamp LC, Jöhrens K, Kirchner T, Reu S, et al. Harmonized PD-L1 immunohistochemistry for pulmonary squamous-cell and adenocarcinomas. Mod Pathol. 2016;29(10):1165–72.

    Article  CAS  PubMed  Google Scholar 

  7. Anderson SM, Zhang L, Brailey L. Evaluation of two immunohistochemical assays for PD-L1 expression. J Clin Oncol 2016;34(suppl) [abstract e23220].

  8. Gaule PB, Rehman J, Smithy JW, Toki MI, Han G, Neumeister V, et al. Measurement of spatial and antibody-based PD-L1 heterogeneity in non-small cell lung cancer. J Clin Oncol. 2016;34(suppl) [abstract 9040].

  9. Ilie M, Hofman V, Dietel M, Soria JC, Hofman P. Assessment of the PD-L1 status by immunohistochemistry: challenges and perspectives for therapeutic strategies in lung cancer patients. Virchows Arch. 2016;468(5):511–25.

    Article  CAS  PubMed  Google Scholar 

  10. Phillips T, Simmons P, Inzunza HD, Cogswell J, Novotny J Jr, Taylor C, et al. Development of an automated PD-L1 immunohistochemistry (IHC) assay for non-small cell lung cancer. Appl Immunohistochem Mol Morphol. 2015;23(8):541–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Rivalland G, Ameratunga M, Asadi K, Walkiewicz M, Knight S, John T, et al. Programmed death–ligand 1 (PD-L1) immumohistochemistry in NSCLC: comparison and correlation between two antibodies. J Clin Oncol. 2016;34(suppl) [abstract e20036].

  12. Schildhaus HU, Richardt P, Wilsberg L, Schmitz K. Occurrence of PDL1/2 copy number gains detected by FISH in adeno and squamous cell carcinomas of the lung and association with PDL1 overexpression in adenocarcinomas. J Clin Oncol. 2016;34(suppl) [abstract 3031].

  13. Ilie M, Falk AT, Butori C, Chamorey E, Bonnetaud C, Long E, et al. PD-L1 expression in basaloid squamous cell lung carcinoma: relationship to PD-1+ and CD8+ tumor-infiltrating T cells and outcome. Mod Pathol 2016;29(12):1552–1564.

  14. Rebelatto MC, Midha A, Mistry A, Sabalos C, Schechter N, Li X, et al. Development of a programmed cell death ligand-1 immunohistochemical assay validated for analysis of non-small cell lung cancer and head and neck squamous cell carcinoma. Diagn Pathol. 2016;11(1):95.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Smith J, Robida MD, Acosta K, Vennapusa B, Mistry A, Martin G, et al. Quantitative and qualitative characterization of two PD-L1 clones: SP263 and E1L3N. Diagn Pathol 2016;11(1):44.

  16. Chaft JE, Chao B, Akerley WL, Gordon M, Antonia SJ, Callahan J, et al. Evaluation of PD-L1 expression in metachronous tumor samples and FDG-PET as a predictive biomarker in Ph2 study (FIR) of atezolizumab (MPDL3280A). Oral presentation at: 16th World Conference on Lung Cancer; September 6–9, 2015; Denver, CO [ORAL 02.06].

  17. Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, et al. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, phase 2 randomised controlled trial. Lancet. 2016;387(10030):1837–46.

    Article  CAS  PubMed  Google Scholar 

  18. Ilie M, Long-Mira E, Bence C, Butori C, Lassalle S, Bouhlel L, et al. Comparative study of the PD-L1 status between surgically resected specimens and matched biopsies of NSCLC patients reveal major discordances: a potential issue for anti-PD-L1 therapeutic strategies. Ann Oncol. 2016;27(1):147–53.

    Article  CAS  PubMed  Google Scholar 

  19. Kowanetz M, Koeppen H, Boe M, Chaft JE, Rudin CM, Zou W, et al. Spatiotemporal effects on programmed death ligand 1 (PD-L1) expression and immunophenotype of non-small cell lung cancer (NSCLC). Oral presentation at: 16th World Conference on Lung Cancer; September 6–9, 2015; Denver, CO [ORAL 13.03].

  20. McLaughlin J, Han G, Schalper KA, Carvajal-Hausdorf D, Pelekanou V, Rehman J, et al. Quantitative assessment of the heterogeneity of PD-L1 expression in non-small-cell lung cancer. JAMA Oncol 2016;2(1):46–54.

  21. Gainor JF, Shaw AT, Sequist LV, Fu X, Azzoli CG, Piotrowska Z, et al. EGFR mutations and ALK rearrangements are associated with low response rates to PD-1 pathway blockade in non-small cell lung cancer: a retrospective analysis. Clin Cancer Res. 2016;22(18):4585–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Huynh TG, Morales-Oyarvide V, Campo MJ, Gainor JF, Bozkurtlar E, Uruga H, et al. Programmed cell death ligand 1 expression in resected lung adenocarcinomas: association with immune microenvironment. J Thorac Oncol. 2016;11(11):1869–78.

    Article  PubMed  Google Scholar 

  23. Inamura K, et al. Relationship of tumor PD-L1 expression with EGFR wild-type status and poor prognosis in lung adenocarcinoma. Jpn J Clin Oncol. 2016;46(10):935–41.

    Article  PubMed  Google Scholar 

  24. Inoue Y, Yoshimura K, Mori K, Kurabe N, Kahyo T, Mori H, et al. Clinical significance of PD-L1 and PD-L2 copy number gains in non-small-cell lung cancer. Oncotarget. 2016;7(22):32113–28.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Mansfield AS, Murphy SJ, Peikert T, Yi ES, Vasmatzis G, Wigle DA, et al. Heterogeneity of programmed cell death ligand 1 expression in multifocal lung cancer. Clin Cancer Res. 2016;22(9):2177–82.

    Article  CAS  PubMed  Google Scholar 

  26. Kitazono S, Fujiwara Y, Tsuta K, Utsumi H, Kanda S, Horinouchi H, et al. Reliability of small biopsy samples compared with resected specimens for the determination of programmed death-ligand 1 expression in non–small-cell lung cancer. Clin Lung Cancer. 2015;16(5):385–90.

    Article  CAS  PubMed  Google Scholar 

  27. Marti AM, Martinez P, Navarro A, Cedres S, Murtra-Garrell N, Salva F, et al. Concordance of PD-L1 expression by different immunohistochemistry (IHC) definitions and in situ hybridization (ISH) in squamous cell carcinoma (SCC) of the lung. Presented at: American Society of Clinical Oncology Annual Meeting; May 30–June 03, 2014; Chicago, IL [abstract 7569].

  28. Midha A, Sharpe A, Scott M, Walker J, Shi K, Ballas M, et al. PD-L1 expression in advanced NSCLC: primary lesions versus metastatic sites and impact of sample age. Presented at: American Society of Clinical Oncology Annual Meeting; June 3–7, 2016; Chicago, IL [abstract 3025].

  29. Casadevall D, Pijuan L, Clave S, Taus A, Hernandez A, Lorenzo M, et al. Evaluation of tumor- and stromal immune marker heterogeneity in lung adenocarcinoma. J Clin Oncol. 2016;34(suppl) [abstract e20029].

  30. Festino L, Botti G, Lorigan P, Masucci GV, Hipp JD, Horak CE, et al. Cancer treatment with anti-PD-1/PD-L1 agents: is PD-L1 expression a biomarker for patient selection? Drugs. 2016;76(9):925–45.

    Article  CAS  PubMed  Google Scholar 

  31. Ma W, Gilligan BM, Yuan J, Li T. Current status and perspectives in translational biomarker research for PD-1/PD-L1 immune checkpoint blockade therapy. J Hematol Oncol. 2016;9(1):47.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Chen DS, Mellman I. Elements of cancer immunity and the cancer-immune set point. Nature. 2017;541(7637):321–30.

    Article  CAS  PubMed  Google Scholar 

  33. Canadian Agency for Drugs and Technologies in Health (CADTH). Rapid response report: peer-reviewed summary with critical appraisal. Epidermal growth factor receptor mutation analysis in advanced non-small cell lung cancer: a review of the clinical effectiveness and guidelines. 2010. Accessed 11 May 2017.

  34. Wolff AC, Hammond ME, Hicks DG, Dowsett M, McShane LM, Allison KH, et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. Arch Pathol Lab Med. 2014;138(2):241–56.

    Article  PubMed  Google Scholar 

  35. Gupta R, Dastane AM, Jr MKR, Marchevsky AM. The predictive value of epidermal growth factor receptor tests in patients with pulmonary adenocarcinoma: review of current "best evidence" with meta-analysis. Hum Pathol. 2009;40(3):356–65.

    Article  CAS  PubMed  Google Scholar 

  36. Hiley CT, Le Quesne J, Santis G, Sharpe R, de Castro DG, Middleton G, et al. Challenges in molecular testing in non-small-cell lung cancer patients with advanced disease. Lancet. 2016;388(10048):1002–11.

    Article  PubMed  Google Scholar 

Download references




Funding for the design of the study, collection, analysis, interpretation of data, and writing assistance was provided by Pfizer, Inc., New York, NY, USA

Availability of data and materials

Not applicable

Author information

Authors and Affiliations



EF, MR, and JK performed SLR. MU, MR, JK, JD, SD, PR, EF were all major contributors in writing the manuscript. MR, JK, and EF are employees of Evidera, who were paid consultants to Pfizer in connection with the development of this manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Margarita Udall.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

JK, MR and EF are employed by Evidera, a consultancy company contracted to perform research. SD, PR, MU and JD are employed by Pfizer.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Udall, M., Rizzo, M., Kenny, J. et al. PD-L1 diagnostic tests: a systematic literature review of scoring algorithms and test-validation metrics. Diagn Pathol 13, 12 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: