Clinicopathological features of Epstein-Barr virus infection, microsatellite instability, tumor mutation burden and PD-L1 status in Chinese patients with gastric cancer

Objectives: Gastric cancer (GC) is the 4th most common type of cancer worldwide. Different GC subtypes exhibit unique molecular features that may potentially guide therapeutic decisions. The aim of the present study was to investigate Epstein-Barr virus (EBV) infection, microsatellite instability (MSI) status, the expression of programmed death-ligand 1 (PD-L1) and gene mutations in patients with surgically-treated GC. Methods: The data of 2,504 GC patients, who underwent potentially curative gastrectomy with lymphadenectomy at Peking University Cancer Hospital between 2013 and 2018, were reviewed from a prospectively collected medical database. We also analyzed the clinicopathological factors associated with the immunohistochemistry (IHC) proles of these patients, and genetic alterations were analyzed using next generation sequencing (NGS). Results: Mismatch repair-decient (d-MMR) GC patients were found to have a higher probability of expressing PD-L1 (p<0.001, PD-L1 cutoff value = 1%). In addition, 4 and 6.9% of the 2,504 gastric cancer patients were EBV-positive and d-MMR, respectively. The number of MLH1/PMS2-negative cases was 126 (6%), and the number of MSH2/MSH6-negative cases was 14 (0.9%). d-MMR status was associated with a diffuse/mixed group (p<0.05), but not with tumor differentiation. Furthermore, MSI and d-MMR GC status (detected by NGS and IHC, respectively) were consistently high, and the rate of MSI was higher in patients with d-MMR GC. A number of genes associated with DNA damage repair were detected in GC patients with MSI, including POLE, ETV6, BRCA and RNF43. In patients with a high tumor mutation burden, the most signicantly mutated genes were LRP1B (79.07%), ARID1A (74.42%), RNF43 (69.77%), ZFHX3 (65.12%), TP53 (58.14%), GANS (51.16%), BRCA2 (51.16%), PIK3CA (51.16%), NOTCH1 (51.16%), SMARCA4 (48.84%), ATR (46.51%), POLE (41.86%) and ATM (39.53%). Conclusions: Using IHC and NGS, MSI status, protein expression, TMB and genetic alterations were identied in patients with GC, which provides a theoretical basis for the future clinical treatment of GC.


Introduction
Gastric cancer (GC) is the 4th most common cancer and the 2nd leading cause of cancer-related death worldwide [1]. Comprehensive molecular characterization at the genomic and transcriptomic levels has led to the identi cation of distinct GC subtypes [2,3]. These different subtypes exhibit unique molecular features that could potentially be used to guide therapeutic decisions, and have been shown to have prognostic signi cance; for example, Epstein-Barr virus (EBV) infection has been associated with improved prognosis. Since the molecular classi cation of GC has potential prognostic and therapeutic implications, it may be used identify biomarkers and therapeutic targets for each subtype, particularly through strati cation according to EBV infection and microsatellite instability (MSI) [4][5][6][7][8]. Currently, the expression of programmed death-ligand 1 (PD-L1) in tumor cells is a validated predictive marker for the tumor response to anti programmed cell death-1 protein (anti-PD-1) or PD-L1 immunotherapy in different malignancies, including GC [9]. According to a study of The Cancer Genome Atlas (TCGA), as well as recent clinical trials, EBV and MSI GC subgroups may bene t from therapy with PD-1/PD-L1 antibodies [10,11]. Furthermore, PD-1/PD-L1 inhibitors appear to enhance antitumor activity in patients with advanced GC [12][13][14][15]. However, the frequency and prognostic value of PD-L1 expression in GC remain controversial. Human epidermal growth factor receptor-2 (HER-2) is a 185-kDa transmembrane tyrosine kinase receptor of the epidermal growth factor receptor family, which has been identi ed as a protooncogene. Growing evidence suggests that HER-2 is an important biomarker and key driver of GC tumorigenesis, which has been associated with cellular proliferation, apoptosis and differentiation. When combined with systemic cytotoxic chemotherapy, Trastuzumab is a therapeutic option for patients with advanced or metastatic HER-2-positive GC. Moreover, HER-2 upregulation or gene ampli cation are important predictive markers in GC [16,17]. The aim of the present study was to investigate EBV infection status using in situ hybridization (ISH), and to determine mismatch repair (MMR) status and PD-L1 expression using immunohistochemistry (IHC), in surgically treated GC patients. Additionally, we analyzed the clinicopathological and prognostic factors associated with these IHC pro les, and used next generation sequencing (NGS) technology to analyze the gene alterations, MMR de cient (d-MMR)/MMR pro cient (p-MMR) status, tumor mutation burden (TMB) data and MSI status of patients with GC.

Patients and general information
In the present study, we reviewed all gastric adenocarcinoma patients from a prospective collected medical database, who underwent potentially curative gastrectomy with lymphadenectomy at Peking University Cancer Hospital between 2013 and 2018. The inclusion criteria included: (1) A positive diagnosis of gastric adenocarcinoma; and (2) the availability of formalin-xed para n-embedded tissue blocks. Surgical specimens were xed in 10% buffered formalin, and the slices were evaluated according to a protocol from the College of Chinese Pathologists. In some cases, IHC was performed to detect cytokeratin, an indicator of lymph node micro-metastasis. GC TNM staging was conducted according to the 2014 edition of Pathology and Genetics of Tumours of the Digestive System (World Health Organization Classi cation of Tumours S.). The study was approved by the ethics committee of Peking University Cancer Hospital, and all patients provided written informed consent prior to surgery.
All para n-embedded specimens were cut into 4-μm sections using a conventional histological technique, and then transferred to slides. IHC staining was performed using the Herceptest kit™ (Dako, Carpinteria, CA, USA) according to the manufacturer's protocol, with an automatic immunostainer (Dako). Staining intensity was evaluated using the 0 to 3+ scale according to the test scoring criteria.

Immunohistochemical evaluation of PD-L1 expression
All tissue slices were evaluated by two pathologists. Specimens were scored based on the area of positively stained tumor cells or tumor-in ltrating immune cells as follows: 1, Positive staining area <1%; 2, from 1% to <10% positive staining; 3, from 10% to <50% positive staining; or 4, ≥50% positive staining.

Evaluation of MMR protein expression and EBV infection status by IHC and ISH
Tumors were considered to have lost MLH1, MSH2, PMS2 or MSH6 expression only if there was a complete absence of nuclear staining in tumor cells; normal epithelial cells and lymphocytes were used as the internal controls. Tumors lacking MLH1, MSH2, PMS2 or MSH6 expression were considered to be d-MMR, while those that maintained expression of these markers were considered to be p-MMR (as long as the tumor cell nucleus was positively stained, and regardless of the percentage positivity). MMR protein expression was assessed by IHC using antibodies against the following: MLH1 (clone no. GM002); MSH2 (clone no. RED2); MSH6 (clone no. EP49); and PMS2 (clone no. EP51) (all Gene Tech Biotechnology Co., Ltd., Shanghai, China). The complete absence of protein expression (0+ in 100% of cells) was considered to indicate the loss of MMR, and thus d-MMR. An EBV-encoded RNA (EBER) ISH kit (OriGene Technologies, Inc., Beijing, China) was used to determine EBV infection status, per the manufacturer's protocol.

TMB and gene mutation analysis
NGS technology was used to detect the MSI status of the GC samples, including TMB and gene mutations (ChosenMed, Inc., Beijing, China). TMB was assessed using the NGS platform (Illumina sequencing platform, PE150) with a sequencing depth greater than 3500x. The candidate MSI loci were detected by identifying a sequence of 1-5 bases with mutations that had repeated at least 5 times in the Bam le. The MSI threshold was determined according to large data sets from the European Genomephenome Archive and TCGA panels: <20% was considered to be microsatellite stable (MSS), 20-30% indicated MSI-L and >30% was considered as MSI-H. Gene mutations were obtained using an assembly clustering algorithm, not by simple cutoff values; the detection limit of the tissue samples was 2%. The variation in the normal samples was 'SNP', and the speci c variation of the tumor samples was 'somatic mutation'.

Immunohistochemical detection of HER-2 expression
HER-2 is located in the cell membrane. Tissues were stained and scored according to the HER-2 Detection Guide for Gastric Cancer as follows: 0, <10% tumor cell membrane staining; 1+, ≥10% tumor cell membrane staining, weak or faintly visible membrane staining, or only partial membrane staining; 2+, ≥10% tumor cells with weak to moderate basal membrane, lateral membrane or complete membrane staining; and 3+, ≥10% strong tumor cell basal membrane, lateral membrane or complete membrane staining.

Statistical analysis
Comparisons between categorical variables were conducted using the χ 2 test or Fisher's exact test as appropriate. Differences in p-values <0.05 were considered to be statistically signi cant.

Association between PD-L1 expression and the clinicopathological features of GC
PD-L1-positive cases were de ned by the presence of membrane staining in least 1% of tumor cells or tumor-in ltrating immune cells. Accordingly, the proportion of PD-L1-positive cases accounted for 20.2% of the patients investigated. Tumor cell PD-L1 expression was identi ed in 11.6, 10.9 and 4% of cases, at different cut-off points, respectively (1, 10, and 50%, according to the positively-stained area of the cell membrane). d-MMR GC patients were found to be more likely to express PD-L1 than p-MMR patients (p<0.001; PD-L1 cutoff value = 1%) ( Figure 1).

The association between HER-2 expression, MMR status and EBER status
In the present study, the number of HER-2 1+ patients was 628/2504 (25.1%), the number of those with HER-2 2+ staining was 313/2504 (12.5%), and 102/2504 patients (4.1%) were HER-2 3+. There were 1,461 patients without HER-2 protein expression, and the ratio of positive-to-negative expression was 58.3%. HER-2 expression was not found to be associated with MMR or EBER status (p=0.129 and p=0.300, Table 3 and 4, respectively).

Discussion
GC is the fourth most common type of cancer worldwide. Due to late diagnosis, the disease is frequently inoperable, and often recurs following curative resection. For patients with advanced and/or unresectable cancer, systemic chemotherapy is generally prescribed as the primary therapeutic option [18]. Traditionally, GC classi cation has been based on histopathological and morphological features, which were rst described in 1965 [19,20]. Unfortunately, classi cations based on morphology are unable to identify molecular targets. Therefore, large scale molecular pro ling via NGS has resulted in the emergence of different molecular-based classi cation systems, which may be exploited for therapeutic intervention. HER-2 has long been associated with cellular differentiation, proliferation and apoptosis, and when used in conjunction with systemic chemotherapy, Trastuzumab is the therapeutic choice for patients with late-stage or metastatic HER-2-positive GC. Moreover, HER-2 overexpression or gene ampli cation is an important predictive indicator in GC; however thus far, no large-scale studies on HER-2 expression in GC have been conducted in China. Wang et al [21] studied 135 patients with GC, where the expression rate of HER-2 protein was 39.3%. In our study, 2,504 GC patients were analyzed, among whom positive cases patients with HER-2 protein (3+)] accounted for 4.1%, and patients with HER-2 protein (2+) expression accounted for 12.5%. HER-2 expression was not found to be associated with EBER or MMR status, nor was it related to MMR status. However, in our previous study of >3,000 cases of colorectal cancer, HER-2 3+ positive expression was found to be more prevalent in p-MMR patients.
PD-L1 can be used as a marker of immunotherapeutic success, which is of great signi cance in clinical treatment. Since both markers are important for the treatment of GC, the relationship between the expression of HER-2 and PD-L1 requires further in-depth investigation. Anti-PD-1/PD-L1 immunotherapy may be used to treat patients with HER-2-negative GC, which provides an alternative treatment option for these individuals.
A number of studies have demonstrated that GC patients with EBV infection comprise ~9% of all cases of GC, and that EBV infection constitutes a distinct clinicopathological and molecular entity [22]. In the present study, EBV positivity was 4%, and the EBV-positive patients were predominantly male, with a diffused/mixed Lauren type and poor tumor differentiation (p<0.001). The EBV infection rate in our study was also lower than the global average. The Alaska Native (AN) population exhibit the highest incidence and mortality rates of GC in North America, with an EBV infection rate of 20%, which is far greater than the global average of 10% [23]. Molecular markers of solid tumors were also detected in 85 GC patients, and the mutation burden and number of somatic mutations was lower in tumors from AN patients. The most common mutation was TP53. In a Japanese study of 1,067 GC cases, the positive rate of EBV infection was 7.1% [24], indicating that the EBV infection rate of GC differs between regions. Less well understood is the involvement of EBV in chronic gastric in ammation, though multiple studies have argued that EBV (similar to and together with Helicobacter pylori) is an early participant in the GC oncogenic process, where it promotes chronic in ammation and subsequently aggravates tissue damage. Through the involvement of various cellular processes and signaling pathways, EBV infection may also contribute to the malignant transformation of GC cells. In contrast to other GC subtypes, GC patients with EBV infection exhibited a number of distinct characteristics in the present study. With a PD-L1 positivity rate >1%, there was no signi cant difference in the level of PD-L1 expression between EBVpositive and EBV-negative patients (p=0.524). In a small case study [25], PD-L1 expression was signi cantly associated with EBV infection (p<0.001). In our study of 2,504 patients, high expression levels of PD-L1 were more likely to occur in d-MMR patients (p<0.001; PD-L1 cutoff value = 1%). In a study by Haron et al [26], a total of 60 GC cases were retrieved. Microsatellite analysis identi ed 10 MSIpositive cases (16.7%), of which six (10.3%) did not express MLH1 (n=3) or MSH2 (n=3) protein. In our study, the number of MLH1/PMS2 protein de cient cases was 126 (6%), and the number of MSH2/MSH6 protein de cient cases was 14 (0.9%). Furthermore, d-MMR GC patients were more likely to express PD-L1 (p<0.001). We think that different types of PD-L1 antibodies, different tissue processing methods, and different systems for evaluating PD-L1 may result in a wide range of different expression rates.
EBV is a carcinogenic virus, and studies have shown that the expression levels of ARID1A and PIK3CA are closely associated with the depth of GC invasion. As such, PIK3CA mutations in EBV-associated GC are usually accompanied by ARID1A mutations [27]. In the present study, we detected a high frequency of ARID1A and PIK3CA mutations; thus in the future, we intend to investigate the relationship between ARID1A, PIK3CA and EBER, and to analyze the expression of these proteins in GC and adjacent normal tissues.
In the present study, a number of the GC cases were sequenced, and cluster analysis was performed to identify various differentially expressed genes therein. Cho et al [28] performed massive parallel sequencing of 381 cancer-related genes and compared the results with the clinicopathological ndings of 330 patients with GC. The most signi cantly mutated genes were TP53 (54%), ARID1A (23%), CDH1 (22%), PIK3CA (12%), RNF43 (10%) and KRAS (9%). Yoon et al [29] identi ed 18,377 MS mutations of ve or more repeat nucleotides in gene coding sequences and untranslated regions (UTRs), and discovered 139 individual genes whose expression was downregulated in association with UTR MS mutation. In our study, numerous DDR-associated genes were detected in d-MMR patients, including ETV6, TP53, BRCA, POLE and RNF43; the most signi cantly mutated genes in d-MMR patients were LRP1B (79.07%), ARID1A (74.42%), RNF43 (69.77%), ZFHX3 (65.12%), TP53 (58.14%), GANS (51.16%), BRCA2(51.16%), PIK3CA (51.16%), NOTCH1 (51.16%), SMARCA4 (48.84%), ATR (46.51%), POLE (41.86%) and ATM (39.53%). We also identi ed that the mutation rate of LRP1B was high, reaching 79.07%. LRP1B belongs to the lowdensity lipoprotein (LDL) receptor gene family. Due to the interaction between these receptors and their ligands, they play a wide range of roles in normal cell functioning and development [30]. The LRP1B gene is also a novel candidate tumor suppressor that is associated with immunotherapeutic success. It has been found that nearly 40% of non-small cell lung cancer cell lines are inactivated by LRP1B alterations at the gene and transcriptional levels [31]. LRP1B is similar to LRP1 of the LDL receptor family. It is capable of inhibiting tumor cell invasion and metastasis by antagonizing extracellular uPA system hydrolyzed protein, degrading the extracellular matrix and preventing cellular migration. In the future, we aim to determine whether the expression levels of proteins encoded in association with GC are altered. LRP1B mutations have also been associated with a high TMB and low patient survival rates, though the relationship between LRP1B mutations and survival in GC is not well understood.
ZFHX3 plays an important role in the biological clock, which if disrupted, may be detrimental to human health. Various studies have shown that ZFHX3 inhibits the proliferation of prostate cancer cells by downregulating MYC gene expression [32], hence when mutated, ZFHX3 may in uence the occurrence of cancer. RNF43 mutation results in a frame shift that leads to the early truncation and potential inactivation of the associated protein, and as such, may be a predictor of pathogenesis. Preclinical studies of gastric and colorectal cancer have shown that the inactivation of RNF43 promotes cellular proliferation and tumor growth [33]. Yu et al discovered a high frequency of RNF43 mutations in colorectal signet ring cell carcinoma, and that mutated RNF43 activates the Wnt pathway [34]. As with RNF43, frame shift mutations in the BRCA2 gene lead to the early truncation of the protein, and its subsequent inactivation may predict pathogenesis. BRCA2 mutations have been widely reported in breast cancer [35], but have not been extensively studied in GC. The PIK3CA Y1021C mutation is located within the PI3K/PI4K domain of the PIK3CA protein, which leads to an increase in the transformation ability of cultured cell lines [36]. GNAS R201C is located in the GTP binding region of the GNAS protein. In a mouse model, R201C resulted in the loss of GTP enzyme activity, the continuous activation of downstream signals, cellular proliferation and tumor formation. Studies have also shown that the mutation rate of GNAS in non-ampullary duodenal adenocarcinoma is 6.5% [37]. However, GNAS mutations have been more extensively studied in tumors of the pancreatic and biliary system than in GC.
ATM mutation leads to premature truncation of the ATM protein. Due to the deletion of all known functional domains, predictive mutations result in the loss of protein function [38]. ARID1A mutations frequently occur in GC and are associated with poor patient prognosis, potentially because the AKT signaling pathway can be activated by the decreased expression or function of ARID1A. The levels of multiple immune markers and TMB in patients with ARID1A mutations were signi cantly higher than those in ARID1A wild-type patients. ARID1A promotes MMR, and as such, ARID1A defects are associated with MMR and MSI. The expression of PD-L1 in alimentary tract cancer patients with ARID1A mutations was signi cantly higher than that in wild-type patients [39][40][41]. In the present study results, NGS revealed a high number of ARID1A mutations in d-MMR patients, thus we intend to analyze the relationship between PD-L1 expression and ARID1A as a future research prospect. ARID1A is a subunit of the SWI/SNF chromatin remodeling complex. The E157G mutation is located in the phosphatase tensin type domain of PTEN, and has been predicted to result in the loss of protein function. Furthermore, V158F has been predicted to lead to PTEN inactivation. In patients with HER-2-positive GC, PTEN deletion mutations are associated with Trastuzumab resistance, and the loss of heterozygosis of this gene has been reported more frequently in GC [42]. The relationship between PTEN protein deletion and various gene alterations in GC is not clear. Kim et al revealed that of 322 patients with advanced GC, the mutation rate of PTEN was 10.6%, and that the deletion of PTEN function was associated with high MSI and EBV-positive status [43]. In solid tumor patients receiving immunotherapy, the median overall survival (OS) of patients with POLE/POLD1 mutations was signi cantly improved compared with that of non-carriers. Additionally, 26% of patients with POLE/POLD1 gene mutations also exhibited MSI-H status. After omitting these patients, OS in the mutant group remained improved; that is to say that in patients with MSS (who generally do not bene t from immunotherapy), the potential value of immunotherapy can still be determined according to POLE/POLD1 gene mutations. Multivariate analysis con rmed that POLE/POLD1 mutation may be used as a novel independent index to predict immunotherapeutic value [44].
MMR status can affect the treatment of gastric cancer, and d-MMR patients are more suitable for immunotherapy. Professor Patil's study was centered around the expression of PD-L1 in gastric cancer and its association with CD8 in the immune microenvironment [45]. As with our own study, professor Patil used tissue microarrays for immunohistochemical staining; however, unlike our study, next generation sequencing data was not presented. We believe that our ndings (such as the gene mutations detected) also have certain research and therapeutic signi cance for GC patients in the United States. Professor Paitl analyzed 86 patients using tissue microarrays; we analyzed 2,504 patients, and used larger tissue sections from postoperative specimens, not tissue microarrays. The immunohistochemical detection of four MMR proteins may be more accurate, though the tissue microarray area is very small, and may not fully represent the protein expression seen in patients. Furthermore, the d-MMR frequency in Professor Paitl's study was 22% while the d-MMR rate in our study was 7.5%. Perhaps the positive part of the GC tissue samples had not been cut accurately (such that it was considered to be d-MMR), so that the resulting percentage was that much higher. In the future research, we will study the molecular markers of immune cells and tumor cells in the tumor microenvironment.
To the best of our knowledge, our study is the largest to investigate the pathological characteristics of GC patients in China. Using IHC, ISH and NGS, the results of this study provide a deeper understanding of GC, including MSI status, HER-2 and PD-L1 expression, TMB and gene alterations in GC patients, which offer a theoretical basis for the future clinical treatment of GC. Our future studies will aim to elucidate the mechanisms by which these mutations impact the development of GC. GC molecular typing is very important. However, due to a shortage of time, we did not analyze the relationship between genes, the survival period and staging. Statistical research in this area will be conducted in our next study.