Pyrosequencing data analysis software: a useful tool for EGFR, KRAS, and BRAF mutation analysis
© Shen and Qin; licensee BioMed Central Ltd. 2012
Received: 24 February 2012
Accepted: 28 May 2012
Published: 28 May 2012
Pyrosequencing is a new technology and can be used for mutation tests. However, its data analysis is a manual process and involves sophisticated algorithms. During this process, human errors may occur. A better way of analyzing pyrosequencing data is needed in clinical diagnostic laboratory. Computer software is potentially useful for pyrosequencing data analysis. We have developed such software, which is able to perform pyrosequencing mutation data analysis for epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene homolog and v-raf murine sarcoma viral oncogene homolog B1. The input data for analysis includes the targeted nucleotide sequence, common mutations in the targeted sequence, pyrosequencing dispensing order, pyrogram peak order and peak heights. The output includes mutation type and percentage of mutant gene in the specimen.
The data from 1375 pyrosequencing test results were analyzed using the software in parallel with manual analysis. The software was able to generate correct results for all 1375 cases.
The software developed is a useful molecular diagnostic tool for pyrosequencing mutation data analysis. This software can increase laboratory data analysis efficiency and reduce data analysis error rate.
The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1348911657684292.
KeywordsEGFR KRAS BRAF Pyrosequencing Software
Epidermal growth factor receptor (EGFR), Kirsten rat sarcoma viral oncogene homolog (KRAS) and v-raf murine sarcoma viral oncogene homolog B1(BRAF) are oncogenes, which may harbor mutations. Molecular diagnosis of these mutations is critical in making therapeutic decisions [1–8]. Pyrosequencing is a direct sequencing technology and can be used for detection of these mutations [9–14]. In our clinical molecular diagnostic laboratory, pyrosequencing is used for EGFR (codon 719, 746–753, 768, 790 and 858), KRAS (codon 12, 13 and 61) and BRAF (codon 600) mutation tests.
When compared to Sanger sequencing, pyrosequencing has several advantages. First of all, it has higher sensitivity. Sanger sequencing needs greater than 20 % of tumor load in a specimen to render a reliable result, while pyrosequencing can render a reliable result with 5 % tumor load. Therefore, pyrosequencing has higher sensitivity. Second, pyrosequencing is faster than Sanger sequencing. Third, pyrosequencing is more cost effective. One of the disadvantages of pyrosequencing is that it can only sequence a short length of nucleotide sequence. The other disadvantage is that pyrosequencing data analysis sometimes can be complex and challenging. The pyrosequencing data analysis for EGFR, KRAS and BRAF is a manual process. Pyrosequencing data output is a pyrogram. The pyrogram consists of a series of peaks with different peak heights, which reflect nucleotide sequence in a targeted DNA segment. Several variables need to be considered during pyrogram data analysis. These variables include the dispensing order, the pyrogram peak sequence, the peak heights, the wildtype sequence of the targeted gene, the possible mutations in a targeted gene, and the ratio of wildtype and mutant genes in a given specimen. Although pyrosequencing data analysis is relatively straight forward for some mutations, it can be complex for other mutations. Moreover, the ratio of wildtype and mutant gene copies varies case by case, which further complicates the pyrogram data analysis. Therefore pyrosequencing data analysis is a relatively sophisticated manual process, during which human errors can occur. We developed a computer software program that can facilitate the pyrogram data analysis.
The pyrosequencing data from 1375 de-identified routine clinical mutation tests were analyzed, which is the total number of pyrosequencing tests performed in our lab from February, 2011 to December, 2011. The specimen DNA was extracted from unstained paraffin sections using QiaCube (Qiagen, Valencia, CA 91355) after manual micro-dissection.
Targeted Mutations and Sequences for EGFR, KRAS and BRAF mutations
EGFR exon 18, codon 719
EGFR exon 19 deletions
EGFR exon 20, Codon 768
EGFR exon 20, Codon 790
EGFR exon 21
KRAS codon 12 & 13
KRAS codon 61 (reverse sequencing)
BRAF codon 600 (reverse sequencing)
Targeted Mutations and Pyrosequencing dispensing order for EGFR, KRAS and BRAF mutations
Pyrosequencing Dispensing Orders
EGFR exon 18, codon 719:
EGFR exon 19 deletions
EGFR exon 20,
Codon 768 and insertions
EGFR exon 20, Codon 790
EGFR exon 21
KRAS codon 12 & 13
KRAS codon 61 (reverse sequencing)
BRAF codon 600 (reverse sequencing)
During pyrosequencing, a sequence primer hybridizes to the targeted DNA template. Polymerase uses deoxyribonucleotide triphosphates (dNTPs) to synthesize a new DNA strand starting from 3’ end of sequence primer along the DNA template. The dNTPs are dispensed into the reaction tube one by one according to the dispensing order. When a dispensed dNTP is complementary to the nucleotide in the DNA template, the dNTP is incorporated into the newly synthesized DNA strand. At the same time, a pyrophosphate (PPi) is released. The released PPi is converted into adenosine triphosphate (ATP) by sulfurylase. The ATP is then used by luciferase to convert luciferin to oxyluciferin, during which visible light is generated in amounts that are proportional to the amount of ATP. The visible light is then captured and depicted as a peak in the pyrogram. In a pyrogram, peaks are labeled as A or C or G or T based on which dNTP is dispensed at the time. The peak height is proportional to the number of complementary base(s) in the template at the point of the dispensing.
The software was developed using Microsoft Excel and is designed to identify common mutations in EGFR, KRAS and BRAF respectively. The source of common mutations is http://www.sanger.ac.uk/genetics/CGP/cosmic/ (up to 6-6-2011). A portion of the test information is built into the software, which includes the targeted nucleotide sequence, common mutations and pyrosequencing dispensing order. Other test information needs to be input after testing, when test result raw data are available. The raw data include pyrogram peak sequences and peak heights. These data are copied and pasted into computer for software data analysis. The software analysis algorithm involves multiple steps, which can be illustrated as following, using EGFR mutation as an example. Step 1 is pattern recognition. In this step, the software compares the pyrogram peak with the known wildtype peak pattern that has been built into the software. This includes the comparison of peak sequence and peak heights of the test result to that of wildtype. If the resulted peak fits a wildtype peak pattern, the software will call it wildtype. If the resulted peak does not fit wildtype pattern, the software will compare it to the common mutant peak patterns that have been built into the software. If it fits one of the mutant peak patterns, the software will consider this mutant pattern as a candidate mutation. In the example shown in Figure 1, the peak pattern fits the EGFR L858R mutation. Therefore, the software will consider the L858R mutation as the candidate mutation and will do next step analysis. In the next step, the software will calculate the percentage of the candidate mutant gene in the specimen, using a built-in formula. In case of EGFR exon 21 L858R mutation, the formula is as following:
[1/3 x A/B + (1-C/B) + (1-D/E)]/3x100.
“A” is the peak height of the second peak (which is labeled as ‘G’ at the dispensing position 3 in Figure 1).
“B” is the average peak height of the reference peaks, each of which is resulted from a single nucleotide incorporation. The reference peaks include the first peak (which is labeled as ‘C’ at the dispensing position 2 in Figure 1), the seventh peak (which is labeled as ‘C’ at the dispensing position 9), the eighth peak (which labeled as ‘T’ at the dispensing position 11), the ninth peak (which is labeled as ‘G’ at the dispensing position 12) and the tenth peak (which is labeled as ‘C’ at the dispensing position 14).
“C” is the peak height of the third peak (which is labeled as ‘T’ at the dispensing position 4 in Figure 1).
“D” is the peak height of the fourth peak (which is labeled as ‘G’ at the dispensing position 5 in Figure 1).
“E” is the peak height of the fifth peak (which is labeled as ‘C’ at the dispensing position 7 in Figure 1).
A particular formula for each mutation is programmed into the software since each mutation has its unique pyrogram peak pattern. If the calculated percentage of mutant component is higher than 5 %, the software will call it mutant. If the percentage is lower than 5 %, the software will not call it mutant since our validated test sensitivity is set to 5 %. The software analysis result is shown in Figure 1B. It indicates that the second peak G is from a mutant; the third and fourth peaks T and G are from wildtype; the rest of the peaks represent a mixture of both mutant and wildtype and the 55 % of the targeted nucleotide sequence in this specimen is from the mutant gene.
EGFR, BRAF and KRAS Mutation Test Analysis
EGFR, BRAF and KRAS mutation tests are routine clinical tests in our clinical molecular lab using pyrosequencing. The raw data is manually analyzed by two lab staff members independently before a result is issued. For this project, the software was used to independently analyze the pyrosequencing data in parallel with manual analysis. The manual analysis results were compared to the software generated results.
EGFR Mutation Data Analysis:
BRAF Mutation Data Analysis:
Comparison of Manual and Software Data Analysis Results for EGFR mutations
KRAS Mutation Data Analysis:
Comparison of Manual and Software Data Analysis Results for BRAF mutations
Comparison of Manual and Software Data Analysis Results for KRAS mutations
Among a total of 1375 tests analyzed, one-reviewer’s manual analysis identified 347 positive results and 1028 negative results. The software identified 351 positive results and 1024 negative results, which was confirmed by a second reviewer. When the manual analysis result is compared with software analysis result, Chi square equals 0.061 and the two-tailed p value equals 0.8046. The software may serve as a useful tool for quality control purpose while the difference between the two detection rates are of no statistic significance.
The main error in computerized analysis is that suboptimal parameters are used to build the software for certain mutations. For example, in the case of the V600K mutation, the parameter for the lower fourth peak C was initially set up as “the height of fourth peak C is lower than 95 % of the average peak height of equivalent normal peaks. In this case, the dispensing order is TCGTATCTGTAG. The sixth, seventh and ninth peaks (which are labeled as G, T and G at the dispensing position of 9, 10 and 12) are used to calculate average normal peak height. During the testing process, it was realized that although such settings can recognize some V600K mutations, but will occasionally misinterpret some V600K cases as V600E. Therefore, the parameter was modified so that instead of using only the fourth peak C, both the fourth peak C and fifth peak T are used in the calculation. Moreover, instead of using “95 % of the average”, “less than two standard deviations of the average” is used. The modified software was tested and was able to interpret the data correctly. It appears that standard deviation reflects normal variation better than an arbitrary 95 %. Such modification is part of fine-tuning process of this software development.
Normally, two individuals will check sequencing results to minimize the human error. In this project, we used our software to check a total of 1375 test results (355 EGFR, 613 BRAF and 407 KRAS). The software was able to pick up 4 errors from the first round of manual analysis, which were also picked up by the second reviewer. The results indicate that the pyrosequencing data analysis software can be used as another layer of quality control.
The pattern recognition concept has been used to generate software for pyrosequencing data analysis. For example, Joakim Lundeber et al have used it for SNPs in chromosome 9 . Pyrosequencing software from Qiagen can provide pyrogram patterns for pure homozygous and heterozygous results of most common mutations in EGFR, KRAS and BRAF [15–17]. A recent software, Pyromaker can provide simulated pyrogram patterns with different percentages of tumor cells . The software developed in our lab is able to analyze real case data. Real case data can be input into our software and the output result will indicate what mutation type and percentage of mutant gene in the specimen. Our software also provides more extensive coverage for various mutations in EGFR, KRAS and BRAF. For example, it has been tailored in such a way so that it can distinguish BRAF V600E, V600K and V600R mutations. It can also distinguish different common variants of EGFR exon 19 deletions. Our software is also fine-tuned to accommodate normal variations in clinical mutation tests. Such features of the software make it a practical tool for pyrosequencing data analysis of real cases.
Based on our literature search using keywords, such as pyrosequencing, software, EGFR, KRAS and BRAF, our software is a unique program developed for EGFR, KRAS and BRAF pyrosequencing data analysis.
The software is designed and fine-tuned by our lab staff members and the software can only be as good as our lab staff members. However, the lab staff’s knowledge and experiences can be built into the software during the fine-tuning process. With such collective wisdom, the software may perform better than one staff member performing manual analysis. Moreover, the software can work more consistently and objectively than a human does, which makes it a valuable quality control tool.
The fine-tuning is also an ongoing training process for the software, especially for rare mutations. Our first stage fine-tuning used the data from 490 mutation test results. This process will continue in our lab as we analyze more cases. The molecular lab staff serves as trainers. Whenever a new mutation is misread by the software, our lab will update the software to cover the new mutation. We will adjust analysis parameters so that the software will be able to recognize the new mutations correctly without losing specificity. Our software is an open system. More coverage of mutations can be added to the software when needed.
The pyrosequencing data analysis software developed is a useful tool. It will tremendously increase the efficiency and consistency of pyrosequencing data analysis.
Epidermal growth factor receptor
Kirsten rat sarcoma viral oncogene homolog
v-raf murine sarcoma viral oncogene homolog B1 mutations.
We thank Kaaron Benson, MD for reviewing and editing the manuscript.
- Pao W, Miller V, Zakowski M, Doherty J, Politi K, Sarkaria I, et al.: EGF receptor gene mutations are common in lung cancers from "never smokers" and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc Natl Acad Sci U S A. 2004, 101 (36): 13306-13311. 10.1073/pnas.0405220101.PubMed CentralView ArticlePubMedGoogle Scholar
- Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al.: EGFR mutations in lung cancer: Correlation with clinical response to gefitinib therapy. Science. 2004, 304 (5676): 1497-1500. 10.1126/science.1099314.View ArticlePubMedGoogle Scholar
- Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto RA, Brannigan BW, et al.: Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N Engl J Med. 2004, 350 (21): 2129-2139. 10.1056/NEJMoa040938.View ArticlePubMedGoogle Scholar
- Lievre A, Bachet JB, Le Corre D, Boige V, Landi B, Emile JF, et al.: KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res. 2006, 66 (8): 3992-3995. 10.1158/0008-5472.CAN-06-0191.View ArticlePubMedGoogle Scholar
- Amado RG, Wolf M, Peeters M, Van Cutsem E, Siena S, Freeman DJ, et al.: Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. J Clin Oncol. 2008, 26 (10): 1626-1634. 10.1200/JCO.2007.14.7116.View ArticlePubMedGoogle Scholar
- Karapetis CS, Khambata-Ford S, Jonker DJ, O'Callaghan CJ, Tu D, Tebbutt NC, et al.: K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008, 359 (17): 1757-1765. 10.1056/NEJMoa0804385.View ArticlePubMedGoogle Scholar
- Flaherty KT, Puzanov I, Kim KB, Ribas A, McArthur GA, Sosman JA, et al.: Inhibition of mutated, activated BRAF in metastatic melanoma. N Engl J Med. 2010, 363 (9): 809-819. 10.1056/NEJMoa1002011.PubMed CentralView ArticlePubMedGoogle Scholar
- Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, et al.: Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N Engl J Med. 2011, 364 (26): 2507-2516. 10.1056/NEJMoa1103782.PubMed CentralView ArticlePubMedGoogle Scholar
- Ronaghi M, Uhlen M, Nyren P: A sequencing method based on real-time pyrophosphate. Scienc. 1998, 281 (5375): 363-365.View ArticleGoogle Scholar
- Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P: Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem. 1996, 242 (1): 84-89. 10.1006/abio.1996.0432.View ArticlePubMedGoogle Scholar
- Dufort S, Richard MJ, Lantuejoul S, de Fraipont F: Pyrosequencing, a method approved to detect the two major EGFR mutations for anti EGFR therapy in NSCLC. J Exp Clin Cancer Res. 2011, 30: 57-10.1186/1756-9966-30-57.PubMed CentralView ArticlePubMedGoogle Scholar
- Dufort S, Richard MJ, de Fraipont F: Pyrosequencing method to detect KRAS mutation in formalin-fixed and paraffin-embedded tumor tissues. Anal Biochem. 2009, 391 (2): 166-168. 10.1016/j.ab.2009.05.027.View ArticlePubMedGoogle Scholar
- Ibrahem S, Seth R, O'Sullivan B, Fadhil W, Taniere P, Ilyas M: Comparative analysis of pyrosequencing and QMC-PCR in conjunction with high resolution melting for KRAS/BRAF mutation detection. Int J Exp Pathol. 2010, 91 (6): 500-505. 10.1111/j.1365-2613.2010.00733.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Packham D, Ward RL, Ap Lin V, Hawkins NJ, Hitchins MP: Implementation of novel pyrosequencing assays to screen for common mutations of BRAF and KRAS in a cohort of sporadic colorectal cancers. Diagn Mol Pathol. 2009, 18 (2): 62-71. 10.1097/PDM.0b013e318182af52.View ArticlePubMedGoogle Scholar
- Qiagen: EGFR pyro handbook. 2010, Valencia, USAGoogle Scholar
- Qiagen: PyroMark® KRAS v2.0 handbook. 2010, Valencia, USAGoogle Scholar
- Qiagen: BRAF pyro® handbook. 2010, Valencia, USAGoogle Scholar
- Ahmadian A, Gharizadeh B, Gustafsson AC, Sterky F, Nyren P, Uhlen M, et al.: Single-nucleotide polymorphism analysis by pyrosequencing. Anal Biochem. 2000, 280 (1): 103-110. 10.1006/abio.2000.4493.View ArticlePubMedGoogle Scholar
- Chen G, Olson MT, O'Neill A, Norris A, Beierl K, Harada S, et al.: A virtual pyrogram generator to resolve complex pyrosequencing results. J Mol Diagn. 2012, 14 (2): 149-159. 10.1016/j.jmoldx.2011.12.001.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.