- Open Access
Gene expression of OCT4, SOX2, KLF4 and MYC (OSKM) induced pluripotent stem cells: identification for potential mechanisms
Diagnostic Pathology volume 10, Article number: 35 (2015)
Somatic cells could be reprogrammed to induced pluripotent stem cells (iPS) by ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM). We aimed to gain insights into the early mechanisms underlying the induction of pluripotency.
GSE28688 containing 14 gene expression profiles were downloaded from GEO, including untreated human neonatal foreskin fibroblasts (HFF1) as control, OSKM-induced HFF1 (at 24, 48, 72 h post-transduction of OSKM encoding viruses), two iPS cell lines, and two embryonic stem (ES) cell lines. Differentially expressed genes (DEGs) were screened between different cell lines and the control by Limma package in Bioconductor. KEGG pathway enrichment analysis was performed by DAVID. The STRING database was used to construct protein-protein interaction (PPI) network. Activities and regulatory networks of transcription factors (TFs) were calculated and constructed by Fast Network Component Analysis (FastNCA).
Compared with untreated HFF1, 117, 347, 557, 2263 and 2307 DEGs were obtained from three point post-transduction HFF1, iPS and ES cells. Meanwhile, up-regulated DEGs in first two days of HFF1 were mainly enriched in RIG-I-like receptor (RLR) and Toll-like receptor (TLR) signaling pathways. Down-regulated DEGs at 72 h were significantly enriched in focal adhesion pathway which was similar to iPS cells. Moreover, ISG15, IRF7, STAT1 and DDX58 were with higher degree in PPI networks during time series. Furthermore, the targets of six selected TFs were mainly enriched in screened DEGs.
In this study, screened DEGs including ISG15, IRF7 and CCL5 participated in OSKM-induced pluripotency might attenuate immune response post-transduction through RLR and TLR signaling pathways.
The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/2503890341543007.
Human embryonic stem (ES) cells have potential in cell replacement therapies using their regenerative properties. Disappointingly, there were many limitations for using of ES cells as therapeutic transplantation material, such as rejection , the risk of teratoma formation from residual ES cells  and inadequate cell number . In contrast, induced pluripotent stem (iPS) cells take advantages over ES cells. It is important to highlight the need to investigate differences between iPS and ES cells. In adult tissues and organs, fully differentiated cells rarely change from one type to another. However, somatic cells can be forcibly reprogrammed to pluripotency by cell fusion, somatic cell nuclear transfer and ectopic expression of defined factors including octamer binding transcription factor 4 (OCT4), SRY related high mobility group box protein 2 (SOX2), Kruppel like factor 4 (KLF4) and myelocytomatosis viral oncogene (MYC) (known as OSKM factors) [4-6]. Takahashi and Yamanaka established a critical landmark with generation of iPS cells from fibroblasts by simple ectopic expression of OSKM factors. Notably, the 2012 Noble Prize in Physiology and Medicine was awarded to researchers for their extraordinary contribution on reprogramming somatic cells to pluripotency . The advantages of OSKM-induced reprogramming to iPS cells were simplicity and robustness, as many different cell types from different species could be reprogrammed to pluripontency by ectopic expression of transcription factors . Therefore, iPS cells offer an expectation for patient-specific pluripotent stem cells therapy.
Generally, many groups have shown that both human and mouse somatic cells can be reprogrammed by ectopic expression of OSKM factors to pluripotent state [9,10]. And a number of technologies were performed to understand the molecular mechanisms of cellular reprogramming mediated by OSKM factors. Gene expression profiling in fibroblasts uncovered three phases of reprogramming termed initiation, maturation and stabilization . A mesenchymal-to-epithelial transition (MTE) was realized as a marker in initiation phase . Also, bone morphogenic protein (BMP) signaling played a critical role in the process of OSKM-induced pluriopotency . In the initiation phase, reprogrammable cells would firstly increase proliferation, then undergo histone modifications, initiate MET and followed by DNA demethylation and X-chromosome reactivation . Then pluripotent genes and developmental regulators were activated which will instigate the second phase. In the last phase, the cytoskeleton was remodeled to an ESC-like state. Polo and collaborators have confirmed the initial work of three phases by further unveiling the two waves of molecular changes during reprogramming process . It has been shown that the first transcriptional wave occurs in all cells and is mostly mediated by MYC, whereas the second wave is driven by OCT4/SOX2/KLF4 and is more restricted to reprogrammable cells. However, it was not fully known about how ectopic expression of OSKM induced fibroblasts to the pluripotent state.
To unravel the molecular mechanisms of this complicated process, microarray analysis was also performed to identify differentially expressed genes (DEGs) with p-value < 0.05 and enriched functions for DEGs . As will be discussed during our research, we utilized this approach to analyze enriched KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways and construct protein-protein interaction (PPI) network and transcriptionally regulatory network for screened DEGs.
The microarray data under the accession number GSE28688 is available at the NCBI Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/) based on the platform Illumina HumanRef-8 v3.0 expression beadchip, which composes of 14 samples including two HFF1 samples as control, six OSKM-induced HFF1 samples which were harvested 24, 48, 72 hours post-transduction, four human iPS cell lines and two human ES cell lines. Transductions were performed using pMX-based retroviral vectors each encoding the transcription factors OCT4, SOX2, KLF4, and c-MYC.
Data preprocessing and DEGs screening
To process gene expression dataset, the log2 of expression matrix which was preprocessed by rank invariant normalization in lumi package  was calculated. Illumina probes were then filtered from 24526 to 17669 as different probes could map to the same gene and average expression value was set as ultimate value. DEGs were identified from different comparisons between OSKM-induced HFF1 cells and the control, between iPS cells and the control, between ES cells and the control using Limma package  in Bioconductor with a t-test under Benjamini Hochberg correction . P < 0.05 and |log2FC| ≥ 1 were selected as the cutoff criteria.
KEGG pathway enrichment analysis
KEGG pathway enrichment analysis for DEGs were carried out by DAVID (Database for Annotation, Visualization and Integrated Discovery) . Pathways with p < 0.05 were identified as significance.
PPI network construction
Fast Network Component Analysis (FastNCA)
FastNCA is a fast method for determining both activities and regulatory influence for a cluster of transcription factors (TFs) . To study the regulation of TFs in the complex process, six TFs [FOXF2 (forkhead box F2), GATA2 (GATA binding protein 2), FOXA3 (forkhead box A3), SMAD6 (SMAD family member 6), STAT5B (signal transducer and activator of transcription 5B) and CNTN2 (contactin 2)] whose targets were enriched in screened DEGs were chosen. Then we calculated activities of these six TFs in ES cell, iPS cell and OSKM-induced HFF1 cells and correlation between activity and gene expression of TFs using this method. To predict interactions between different TFs, STRING  was utilized and interaction network was visualized by Cytoscape . Meanwhile, FastNCA was also performed to construct transcriptionally regulatory network for these six TFs and their target DEGs in different cell lines.
In order to gain insight into the molecular events during the early stage of reprogramming, we screened DEGs from comparisons between HFF1 cells at 24, 48, 72 h post-transduction of OSKM encoding viruses and HFF1 control, between HFF1-derived iPS cell lines and control, between the ES cell lines and control. As a result, 117, 347, 557, 2307 and 2263 DEGs were obtained, respectively (data was shown in Table 1). As shown, the number of screened DEGs in OSKM-induced HFF1 cell gradually increased with time, whilst the number of DEGs in iPS cells was nearly equal to ES cells. However, both up- and down-regulated DEGs were different in all comparisons.
KEGG pathway enrichment analysis
We looked for enriched KEGG pathways of DEGs (see Table 2). Up-regulated DEGs at 24 h and 48 h post-transduction were both mainly enriched in RIG-I-like receptor (RLR) signaling pathway and Toll-like receptor (TLR) signaling pathway, especially ISG15 (ISG15 ubiquitin-like modifier), STAT1 (signal transducer and activator of transcription 1), DDX58 [DEAD (Asp-Glu-Ala-Asp) box polypeptide 58], IRF7 (interferon regulatory factor 7) and CCL5 [chemokine (C-C motif) ligand 5]. While up-regulated DEGs at 72 h post-transduction were significantly enriched in amino acid metabolism pathway, and down-regulated DEGs were enriched in focal adhesion, regulation of actin cytoskeleton and pathway in cancer, specifically ATCB (actin, beta), ITGA2 (integrin, alpha 2) and PDGFRA (platelet-derived growth factor receptor, alpha polypeptide). Up-regulated DEGs in comparisons between ES cells and the control, between iPS cells and the control were both mainly enriched in basal cell carcinoma and Wnt signaling pathway, especially TP53 (tumor protein p53), but down-regulated DEGs were enriched in focal adhesion and ECM-receptor interaction pathway, especially CCND1 (tumor protein p53) and CD44.
PPI network construction
To identify key components of reprogramming process, we constructed PPI networks in five contrastive groups separately (data not shown). Because PPI networks were greatly complicated, the numbers of nodes and edges and proteins with higher degree in networks were shown in Tables 3 and 4.
TF activities calculated by FastNCA and correlation with gene expression of TFs
Figure 1A shows the estimated activities of six TFs. HFF1 cells as control were not treated by OSKM and activities of TFs in control were set as the base level. STAT5B, FOXF2, CNTA2 and SMAD6 were activated post-transduction of OSKM encoding viruses. STAT5B retained high activity at 24, 48 h in OSKM-induced HFF1 cells, iPS cells and ES cells compared with the control. FOXF2 activity returned to base level but peaked in iPS cells and ES cells. CNTN2 activities in OSKM-induced HFF1 cells and iPS cells were higher than base level. As to SMAD6, its activity was higher than base level just at 24 h post-transduction but returned to base level at 48 h. FOXA3 activities were higher just in iPS cells and ES cells but maintained base level in OSKM-induced HFF1 cells. GATA2 activities were lower in OSKM-induced HFF1 cells but higher in iPS cells and ES cells than base level.
Figure 1B demonstrated the correlation between activities of six TFs predicted by FastNCA and gene expression of these TFs. As shown, CNTN2 and STAT5B showed strong positive correlation between activities and expression possibly due to auto- or cross-regulation. On the other hand, the activities and expression were also strongly correlated for SMAD6, FOXA3, FOXF2 and GATA2. Positive correlation stated that TFs might participate in the same biological pathway or interact between each other.
We wondered if predicted correlation between TF activities and gene expression could be due to the interaction of two TFs, either as a complex or otherwise. Thus, TF pairs with significant activity correlation to published protein-protein interactions were checked (Figure 2). Intriguingly, TFs which were predicted to act together showed high correlation.
Regulatory network for TFs and DEGs
To gain insight into the enriched targets for TFs, regulatory networks were constructed for TFs and DEGs (Figures 3 and 4). In different regulatory network, the number of target DEGs varied widely (Table 5). In OSKM-induced HFF1 cells, screened DEGs were significantly targeted by STAT5B, FOXF2 and GATA2, but in iPS and ES cells, screened DEGs were mainly targeted by STAT5B, FOXA3 and FOXF2. As a result, GATA2 and FOXA3 might be the difference between somatic cells and pluripotent cells.
To drive somatic cells to the pluripotent state, viral transduction of OSKM factors is considered as the most robust method. Despite this, we do not fully elucidate the molecular mechanisms of reprogramming which induce somatic cells to pluripotency. To this end, we used microarray analysis to identify crucial events occurring within the first 72 hours of initiation phase. On the one hand, the screened DEGs during time series via different pathways regulated reprogramming process. On the other hand, significant TFs regulated target genes or interacted with other factors to affect reprogramming.
Following our finding from the pathway enrichment analysis, we demonstrated that up-regulated DEGs in the first 48 hours were enriched in RLR signaling pathway and TLR signaling pathway. These two pathways were reported to play an important role in immune response . Although somatic cell reprogramming by viral transduction is an effective method to obtain ES-like cells, the host cell immune response acts as a roadblock to efficient reprogramming. Targeted by TFs, ISG15, IRF7 and CCL5 were significantly expressed in these two pathways. Associated with transcriptionally regulatory network, ISG15 expression which was targeted by STAT5B and GATA2 factors was induced by virus infection. Based on accumulating evidence, it is proposed that virus-induced ISG15 expression would conjugate ubiquitin to RIG-I to inhibit RLR signaling and attenuate immune response . Together, these studies suggested that attenuation of HFF1 cell’s immune response is of benefit to reprogramming process. Meanwhile, virus infection triggers SUMOylation of IRF7 and this modification negatively regulated virus-stimulated interferon transcription . And TF GATA2, targeted with CCL5 and ISG15, has appeared to regulate the survival/proliferation of self-renewing stem cells . In our research, up-regulated DEGs including ISG15, IRF7 and CCL5 were accordance with the aforementioned information. Importantly, the results from PPI networks in the time series showed that CCL5 interacted with MYC, IRF7, ISG15, STAT1 and DDX58 which were mostly interferon-stimulated genes . Moreover, a number of reports have been published showing that MYC and other three factors induced somatic cells to pluripotent cells [29,30]. Consequently, MYC might participate in reprogramming process through interacting with CCL5 and other genes via RLR and TLR signaling pathway.
At 72 h post-transduction, down-regulated DEGs were enriched in focal adhesion and regulation of actin cytoskeleton pathways which reflected the potential establishment of cell-cell contact favorable for inducing pluripotency and were similar to iPS and ES cells, especially ATCB and ITGA2. ACTB, the target gene of STAT5B, interacted with MX1 which was a key mediator of the interferon-induced antiviral response against most of viruses through inhibiting viral primary transcription . ITGA2, as a member of integrin family, could activate focal adhesion kinase and lead to cell cycle progression and cell migration which were contributed to cell reprogramming . As a result, ACTB and ITGA2 which were targeted by TFs played a vital role in reprogramming process likely via focal adhesion pathway.
From microarray analysis for identified DEGs, results showed that gene expression of iPS cells was most similar to ES cells. Furthermore, gene expression of HFF1 cells at 72 h post-transduction was mostly alike with iPS cells. In summary, a series of interferon-stimulated genes including ISG15, IRF7 might regulate cell pluripotency via RLR and TLR signaling pathways to attenuate immune response for OSKM encoding viruses, but ATCB and MX1 participated in reprogramming perhaps through focal adhesion pathway. Nevertheless, future cell and animal experiments will be required to determine the role of these genes in OSKM-induced pluripotency.
Human embryonic stem
Induced pluripotent stem
Bone morphogenic protein
Differentially expressed genes
Kyoto Encyclopedia of Genes and Genomes
Octamer binding transcription factor 4
SRY related high mobility group box protein 2
Kruppel like factor 4
Myelocytomatosis viral oncogene
Wobus AM, Boheler KR. Embryonic stem cells: prospects for developmental biology and cell therapy. Physiol Rev. 2005;85:635–78.
Tang C, Drukker M. Potential barriers to therapeutics utilizing pluripotent cell derivatives: intrinsic immunogenicity of in vitro maintained and matured populations. Semin Immunopathol. 2011;33:563–72.
Fong CY, Gauthaman K, Bongso A. Teratomas from pluripotent stem cells: A clinical hurdle. J Cell Biochem. 2010;111:769–81.
Jaenisch R, Young R. Stem cells, the molecular circuitry of pluripotency and nuclear reprogramming. Cell. 2008;132:567–82.
Yamanaka S, Blau HM. Nuclear reprogramming to a pluripotent state by three approaches. Nature. 2010;465:704–12.
Pasque V, Miyamoto K, Gurdon JB. Efficiencies and mechanisms of nuclear reprogramming. Cold Spring Harb Symp Quant Biol. 2010;75:189–200.
Jaenisch R. Nuclear cloning and direct reprogramming: the long and the short path to Stockholm. Cell Stem Cell. 2012;11:744–7.
Stadtfeld M, Hochedlinger K. Induced pluripotency: history, mechanisms, and applications. Genes Dev. 2010;24:2239–63.
Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–76.
Maherali N, Ahfeldt T, Rigamonti A, Utikal J, Cowan C, Hochedlinger K. A high-efficiency system for the generation and study of human induced pluripotent stem cells. Cell Stem Cell. 2008;3:340–5.
Samavarchi-Tehrani P, Golipour A, David L, Sung HK, Beyer TA, Datti A, et al. Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell. 2010;7:64–77.
Li R, Liang J, Ni S, Zhou T, Qing X, Li H, et al. A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell. 2010;7:51–63.
Buganim Y, Faddah DA, Jaenisch R. Mechanisms and models of somatic cell reprogramming. Nat Rev Genet. 2013;14:427–39.
Polo JM, Anderssen E, Walsh RM, Schwarz BA, Nefzger CM, Lim SM, et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell. 2012;151:1617–32.
Mah N, Wang Y, Liao MC, Prigione A, Jozefczuk J, Lichtner B, et al. Molecular insights into reprogramming-initiation events mediated by the OSKM gene regulatory network. PLoS One. 2011;6:e24351.
Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008;24:1547–8.
Smyth GK. limma: Linear Models for Microarray Data. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005. p. 397–420. Statistics for Biology and Health.
Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med. 1990;9:811–8.
da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57.
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41:D808–15.
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27:431–2.
Chang C, Ding Z, Hung YS, Fung PC. Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data. Bioinformatics. 2008;24:1349–58.
Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, et al. A travel guide to Cytoscape plugins. Nat Methods. 2012;9:1069–76.
Loo YM, Gale Jr M. Immune signaling by RIG-I-like receptors. Immunity. 2011;34:680–92.
Arimoto K, Konishi H, Shimotohno K. UbcH8 regulates ubiquitin and ISG15 conjugation to RIG-I. Mol Immunol. 2008;45:1078–84.
Kubota T, Matsuoka M, Chang TH, Tailor P, Sasaki T, Tashiro M, et al. Virus infection triggers SUMOylation of IRF3 and IRF7, leading to the negative regulation of type I interferon gene expression. J Biol Chem. 2008;283:25660–70.
Tsai FY, Keller G, Kuo FC, Weiss M, Chen J, Rosenblatt M, et al. An early haematopoietic defect in mice lacking the transcription factor GATA-2. Nature. 1994;371:221–6.
Schoggins JW, Rice CM. Interferon-stimulated genes and their antiviral effector functions. Curr Opin Virol. 2011;1:519–25.
Soufi A. Mechanisms for enhancing cellular reprogramming. Curr Opin Genet Dev. 2014;25:101–9.
Chin MH, Mason MJ, Xie W, Volinia S, Singer M, Peterson C, et al. Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell. 2009;5:111–23.
Haller O, Kochs G. Human MxA protein: an interferon-induced dynamin-like GTPase with broad antiviral activity. J Interferon Cytokine Res. 2011;31:79–87.
Riemenschneider MJ, Mueller W, Betensky RA, Mohapatra G, Louis DN. In situ analysis of integrin and growth factor receptor signaling pathways in human glioblastomas suggests overlapping relationships with focal adhesion kinase activation. Am J Pathol. 2005;167:1379–87.
The authors declared that they have no competing interests.
YC: participated in the design of the study and performed the statistical analysis, and helped to draft the manuscript. XD: participated in the design of the study and performed the statistical analysis, and helped to draft the manuscript. QZ: performed the statistical analysis. ZD: performed the statistical analysis. All authors read and approved the final manuscript.
About this article
Cite this article
Cai, Y., Dai, X., Zhang, Q. et al. Gene expression of OCT4, SOX2, KLF4 and MYC (OSKM) induced pluripotent stem cells: identification for potential mechanisms. Diagn Pathol 10, 35 (2015) doi:10.1186/s13000-015-0263-7
- Transcriptional factor
- Protein-protein interaction network
- Regulatory network