A transcriptome profile in hepatocellular carcinomas based on integrated analysis of microarray studies

Background Despite new treatment options for hepatocellular carcinomas (HCC) recently, 5-year survival remains poor, ranging from 50 to 70%, which may attribute to the lack of early diagnostic biomarkers. Thus, developing new biomarkers for early diagnosis of HCC, is extremely urgent, aiming to decrease HCC-related deaths. Methods In the study, we conducted a comprehensive characterization of gene expression data of HCC based on a bioinformatics method. The results were confirmed by real time polymerase chain reaction (RT-PCR) and TCGA database to prove the credibility of this integrated analysis. Results After integrating analysis of seven HCC gene expression datasets, 1167 differential expressed genes (DEGs) were identified. These genes mainly participated in the process of cell cycle, oocyte meiosis, and oocyte maturation mediated by progesterone. The results of experiments and TCGA database validation in 10 genes was in full accordance with findings in integrated analysis, indicating the high credibility of our integrated analysis of different gene expression datasets. ASPM, CCT3, and NEK2 was showed to be significantly associated with overall survival of HCC patients in TCGA database. Conclusion This method of integrated analysis may be a useful tool to minish the heterogeneity of individual microarray, hopefully outputs more accurate HCC transcriptome profiles based on large sample size, and explores some potential biomarkers and therapy targets for HCC. Electronic supplementary material The online version of this article (doi:10.1186/s13000-016-0596-x) contains supplementary material, which is available to authorized users.


Background
Hepatocellular carcinoma (HCC) is one of the most frequently occurring malignant tumors worldwide [1]. Risk factors of HCC are well recognized including gender, infection by hepatitis B virus or hepatitis C virus, cirrhosis, metabolism diseases, toxins, excess alcohol consumption, and smoking. HCC varies with wide geography, and is more prevalent in Asia, Africa, and southern Europe. It has been well defined that experiencing surgery for early HCC patients could achieve a higher curative resection rate (80.5%) [2], and finally have a better survival rate. However, patients with early HCC frequently manifest non-typical symptoms, hence, most of patients are diagnosed with advanced HCC when seeing a doctor, resulting in a low 5-year survival rate, ranging from 50 and 70% [3]. Therefore, developing biomarkers for early diagnosis is being emphasized to prolong survival in patients with HCC.
Over the last decades, large efforts have been made to promote the early diagnosis of HCC. Alpha-fetoprotein (AFP) has been the most commonly used tumor biomarker in the liver, testicles, and ovaries [4]. Highly sensitive and specific biomarkers need to be developed in HCC diagnosis. Glypican-3 (GPC3), a membrane-associated heparan sulfate proteoglycan, is up-regulated in HCC. Additionally, GPC3 involved in hippo pathway to exert its function in HCC cell proliferation. GPC may be applied in clinical practice as a novel diagnostic biomarker [5].
Additionally, some researchers have attempted to employ prognostic markers for predicting HCC recurrence. Villa E et al. detected whole genome microarray expression profiling of 161 HCC samples, and revealed that five-gene signature (ANGPT2, NETO2, NR4A1, DLL4, ESM1) was able to predict fast growth and worst survival of HCC patients [6]. The exploration of prognostic markers may facilitate individualized therapies.
Recently, detection of genome-wide gene transcripts expressed in a given tissue type is becoming more and more feasible with advent of high-throughput technologies, such as microarray and RNA-seq. The application of microarray-based gene expression profiling has produced tremendous information, and provided mechanistic insights into the oncogenic process of HCC [7]. However, although many microarray studies of HCC have been performed [8][9][10][11], each of study holds a somewhat different view due to the heterogeneity caused by the variety in clinical samples, platform, analytical approach, etc. Toward this end, an integrated analysis of seven HCC gene expression datasets was conducted to identify differential expressed genes (DEGs) between tumor and normal tissues, revealing a common biological thread that linked the disparate microarray studies. Ten genes were selected for further real time polymerase chain reaction (RT-PCR) and TCGA database validation, to prove the credibility of this integrated analysis. We expected our study would be of some value for the future diagnosis and therapy of HCC in clinic.

Eligible HCC gene expression datasets
The raw gene expression datasets of HCC and control samples were selected and downloaded in the Gene Expression Omnibus (GEO) database. The datasets meeting the following criteria were included: i) the expression profile of whole genome; ii) data from the tumor and tumor-adjacent normal liver tissues from HCC patients in clinic; iii) raw data or standardized data. Cirrhotic liver tissue sets, non-human sets, and integrated analysis of gene expression profiles were excluded.

Identification of HCC gene expression profile
We selected the Z-score transformation [12] method to normalize raw data from different platforms. The MATrix-LABoratory (MATLAB) software was applied to calculate differently expressed probe sets between tumor and tumor-adjacent normal tissue, using gene specific t-test. The genes with FDR ≤ 0.05 were selected as the significantly differentially expressed genes (DEGs). Heat map analysis was conducted using the "heatmap.2" function of the R/Bioconductor package "gplots" [13].

Gene ontology (GO) of differentially expressed genes
The GO and pathway enrichment was analyzed via the online software GENECODIS to facilitate the interpretation of biological roles of DEGs (http://genecodis.cnb.csic.es) [14]. The GO functions of the DEGs were determined according to different categories including biological process, molecular functions, and cellular components. In addition, pathway enrichment analysis was based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Protein-protein interaction (PPI) network construction
In order to find candidate genes involved in the oncogenesis and hepatic dysfunction of HCC, PPI networks of significantly DEGs were constructed according to the data from Biological General Repository for Interaction Datasets (BioGRID) (http://thebiogrid.org/). Among the candidate genes, the PPI networks of the top 20 most significantly dysregulated genes were visualized via Cytoscape [15].

RNA Isolation and RT-PCR validation
Tumor and matched adjacent normal liver tissues which were obtained from five HCC patients in the current study, were frozen immediately after surgery, and were stored at −135°C for RNA extraction. Frozen sections were made and evaluated independently by senior pathologists. The study was approved by the First Affiliated Hospital of PLA General Hospital ethnics committee. The ethics committee approved the relating screening, inspection, and data collection of the patients, and all subjects signed a written informed consent form. All works were undertaken following the provisions of the Declaration of Helsinki.
The whole RNA of liver tissue for each sample was extracted using RNAeasy Mini Kit (Qiagen, Valencia, CA) according to the manufacture's protocol. Ten genes were randomly selected from the 20 most significantly DEGs. Primers for the ten genes were designed using PrimerPlex 2.61 (PREMIER Biosoft, Palo Alto, CA) (Additional file 1: Table S1). Expression levels of genes were screened by SYBR (Applied Biosystems/Life Technologies, Carlsbad, CA) in ABI 7500 Real Time PCR System (Applied Biosystems, Carlsbad CA). Relative gene expression was calculated with Data Assist Software version 3.0 (Applied Biosystems/Life Technologies) and human actin gene was used as a reference. The expression level of each gene was determined according to the method of 2 -△△ct .

TCGA database validation of selected genes in HCC patients
Through the online validation tools, the expression status of selected genes in HCC were determined in TCGA database (https://genome-cancer.ucsc.edu/), assessing their mRNA expression patterns in HCC patients (N = 423) [16]. The selected genes were also evaluated for the overall survival time of HCC patients in correlation with their expression pattern (http://cbioportal.org) in the TCGA database (N = 442) [17].

Candidate genes involved in the occurrence of HCC
Seven microarray datasets of HCC were identified according to the including criteria. Among of them, GSE17548, GSE33006, GSE17856, and GSE1481 didn't contain the gene expression data of tumor-adjacent normal liver tissues. 267 HCC samples and 67 control samples were enrolled in the integrated analysis. The information of each microarray dataset was shown in Table 1. Based on microarray datasets available for integrated analysis, a total of 1167 DEGs were identified, among which, 628 genes were up-regulated and 539 genes were down-regulated. The detailed information of the 20 most significantly up-regulated or down-regulated genes were shown in Additional file 1: Table S2. The top 50 most significantly DEGs were displayed in a heat map across different HCC microarray datasets (Fig. 1).  (Table 2). Based on KEGG database, the 1167 DEGs were involved in 99 signal pathways, including cell cycle, oocyte meiosis, oocyte maturation mediated by progesterone, pathways in cancer, p53 signaling pathways, production of phagosome, metabolism of  fatty acid, cytokines-cell factor receptor interactions, prion diseases, etc. (Table 3).

Experimental and TCGA database validation of selected genes in HCC patients
Ten genes (ASPM, CAP2, CCT3, NEK2, SNRPE, CLEC4M, DCN, ECM1, RND3 and SPINT2) were randomly retrieved from the 20 most significantly up-regulated or down-regulated genes, respectively. After performing RT-PCR, the expression levels of selected 10 genes in clinical samples were identical with the results of the integrated analysis. For the ten genes, the mRNA expression was statistically different between tumor and matched adjacent normal liver tissues ( Fig. 3; Additional file 1: Table S3) (P < 0.01). Furthermore, results of TCGA database validation indicated that these genes showed similar expression trends to those obtained from the integrated analysis (Fig. 4). Among the ten genes, only the ASPM, CCT3, and NEK2 showed significant association with overall survival time of HCC patients in TCGA database (P < 0.05) (Fig. 5).

Discussion
It is generally accepted that the altered gene expression pattern of a cancer tissue should be associated with the initiation and maintenance of the malignant phenotype. Previous studies have identified several HCC gene expression profiles [18][19][20][21]. However, there wasn't a common pattern among disparate studies for HCC. While in this study, we integrated different microarray studies to identify a precise gene expression profile for HCC with more statistical power supported by large sample size. In the current study, an integrated analysis of seven HCC microarray datasets was conducted, and showed that 1167 DEGs were identified, among which 628 genes were up-regulated and 539 genes were down-regulated. These genes mainly participated in the process of cell cycle, oocyte meiosis, and oocyte maturation mediated by progesterone.
In the current study, further annotation and PPI network analysis of the 20 most significant DEGs were conducted. Most of the 20 genes were involved in the pathways of cell cycle, cytokines-cell factor receptor interactions, and intracellular signaling cascades, and their involvements in HCC have also been reported [22][23][24][25][26]. The functions of the 20   genes were in accordance with the results of GO and KEGG analysis. Three genes, including CCT3, NDC80, and ASPM were proved to be highly connected in the PPI network. CCT3, a subunit of CCT cluster, plays a role in assisting the folding of proteins involved in important biological processes. CCT3 was found to display a significantly different gene expression level in HCC compared to adjacent non-malignant liver tissues, arising from the occurrence of the amplicon 1q21-q22 [27], which is consistent with our result of RT-PCR validation. In addition, other genes' expression status detected by RT-PCR was totally in accordance with the result of integrated analysis, suggesting that the bioinformatics method of integrated analysis was credible. ASPM was highly expressed in fetal tissues but lowly in most adult tissues. Our result and previous evidences [23] found that ASPM and NEK2 mRNA was over-expressed in HCC. Moreover, we found that ASPM, NEK and CCT3 over-expression present significant association with overall survival of HCC patients based on TCGA validation, predicting enhanced invasive/metastatic potential of HCC and higher risk of early tumor recurrence. ASPM, NEK and CCT3 may be applied as potential prognostic biomarkers for HCC. CAP2 overexpression was also discovered in our study, and CAP2 has been suggested as a candidate biomarker of HCC owing to elevated level in the serum of HCC patients [28].
Among the 10 most significantly down-regulated genes, DCN, an extracellular matrix proteoglycan, has important biological functions in growth, development and diseases. Loss of the decorin gene, which are known to interfere with cellular events of tumorigenesis mainly by blocking various receptor tyrosine kinases such as EGFR, Met, IGF-IR, PDGFR and VEGFR2, is permissive for tumorigenic growth of HCC with decreasing levels of the cyclin-dependent kinase inhibitor p21 WAF1/CIP1 , suggesting potential utilization of DCN as an antitumor agent in HCC [29]. RND3 down-regulation in HCC patients has been reported by several studies [26,30,31], and may be a metastasis suppressor gene in HCC.
However, the expression patterns of four genes among the 20 most significant DEGs in the current study were inconsistent with or ignored in the previous studies, including TBCE, SPINT2, ECM1, and KZAN. The function of KZAN was not identified, whereas the other three genes were all comprehensively studied. In the current study, the inconsistent results might inspire their roles in the oncogenesis and development of HCC with some novel views.
SPINT2 encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein product of SPINT2 inhibits HGF activator, which prevents the formation of active hepatocyte growth factor, has been taken as a putative tumor suppressor [32]. Previous studies mainly focus on the methylation of SPINT2 in HCC instead of its expression [33,34]. Nevertheless, we have found that the expression level of SPINT2 was significantly suppressed in HCC expression profiles. The pattern was consistent with that in cell renal cell carcinoma [32], which might indicate its potential application as a novel HCC suppressor.
ECM1 encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor  biology. It interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis [35]. The expression of ECM1 is reported to be significantly up-regulated in HCC patients [24], however, the current analyses of expression profiles showed that expression of ECM1 was suppressed in HCC patients and were confirmed using RT-PCR. The discrepancy revealed the complicated functions of ECM1 in the oncogenesis and development of HCC.

Conclusions
In short, the current study gave an explicit elucidation of dysregulated genes in HCC by the integrated analysis of microarray datasets in GEO database, the biological function of these genes was significantly enriched in cell cycle. The results of RT-PCR and TCGA validation were consistent with that of integrated analysis, indicating the high credibility of this integrated analysis method. In addition, our study showed that some genes could be potentially valuable in the clinical diagnosis (such as ASPM, NEK2 and CCT3) and anticancer therapy (such as DCN, RND3) for HCC. Our study improved the understanding of the transcriptome status of HCC, and might shed a light on the further investigation on the mechanisms of HCC.

Additional file
Additional file 1: Table S1. Detail information of primers. Table S2.