Clinical application of genomic data in cancer

Issuing time:2018-10-23 00:00

With the application of high-throughput experimental techniques such as sequencing in tumor research, more and more genomic data have been used to diagnose, treat and predict the prognosis of tumor patients in recent years. More and more analytical methods are used in the analysis and interpretation of genomic data.

The major analysis processes and tools for the genome are shown in Figure 1 [1] 1. Identification of mutations, including Alignment of sequencing sequences to the genome (Alignment); Identifying mutation ( Variant Calling) and differentiating somatic mutation from germline mutation. Annotate the mutation data. 2. Interpretation of mutations: identification of molecular changes that can be used to guide targeted treatment; Patients were divided into different risk groups through survival analysis to achieve individualized detection and treatment [2] 。


In this article, we describe two analytical tools that use genomic data to predict prognosis and guide therapy.

1. Application of genomic data to predict prognosis in cancer patients

(1) survival analysis with survival time as a continuous variable

All samples were first divided into a training cohort (for screening prognostic-related variables and survival modeling) and a test cohort (for testing survival constructed in the training cohort)model ).

In general, 80 % of all samples are randomly defined as training queues  Set), the remaining 20% is defined as a test queue Set).For the training cohort, prognostic variables were first selected by univariate Cox survival analysis (P <  0.05 ).

Then two modeling methods were used to model the survival of the selected prognostic variables : 1.  Cox: Cox multivariate regression model and LASSO model, 2. Rsf: Random Survival  Forest.Finally, the predictive validity of the established prognostic model was evaluated by C-index ( see Figure 2 ).

In the above analysis process, the main analytical tools are:

Cox univariate and multivariate analysis: survival package in R language;

LASSO analysis: R language "glmnet" package;

Random  Survival  Forest analysis : R package, Random  Survivalforest "or the latest version" Random Forest SRC package.


Analysis of prognostic status as a categorical variable

Setting a cutoff divides the continuous survival time into categorical variables " good prognosis " and " poor prognosis ".

First, we can select the prognostic variables in two ways : ANOVA and  Shrinking  Centroids.

Then there are eight algorithms for categorizing prognostic variables: diagonal Discriminant Analysis  ( DDA ),  K-near Neighbor (KNN) Analysis  (DA), Logistic Regression  ( LR ),  Near est Centroid (NC), partial Least  Square  ( PLS ),  Random Forest (RF) And Support Vector Machine  (SVM).10-fold cross-validation is used to assess utility.The predictive power of the established prognostic model was evaluated by c-index.

2. Identifying somatic mutations in clinically relevant genes.

It's a good idea  Heuritics For Interpreting This is a lot of people  Landscape (phial):

Use precision Heuritics For Interpreting The  Alternation  Landscape ( PHIAL ) algorithm classifies somatic mutations to determine the relevance of gene mutation and clinical application, and thus identify potential therapeutic targets.

Clinical related genes Related The concept of genes refers to genes that produce somatic mutations that can cause resistance or response to treatment, and / or have an impact on diagnosis or prognosis.Clinically  Actionable Gene's concept refers to any gene in cancer that produces somatic mutations that can predict therapeutic efficacy or therapeutic resistance to a particular cancer treatment,the ability to diagnose or predict prognosis is considered a clinically Actionable Gene.

First of all, through literature search, manual screening and reference to expert advice on the way, resulting in 121 Actionable Genes are stored in the database TARGET ( http : / / cancer / cga / target ). Genes in the database can be used to guide treatment, predict prognosis and assist diagnosis. The integration principles of the 121 TARGET genes include : 1.  By clinical and biological relevance 2. Linking TARGET genes to other biologically relevant pathways or gene sets Three  Demotion is of unknown significance to determine the mutation. Thus forming the PHIAL.

In order to distinguish and sort mutations to the greatest extent, in addition to considering the genes in TARGET, other criteria are applied to the classification of mutation points.Including Cancer. Gene Current in Census Alternatives;curated in msigdb Cancer  Actionable changes in the same sample for pathways analysis Genes pathway;cancer pathways or gene sets analyzed by msigdb;and the mutations in COSMIC ( Figure 3 ). All PHIAL code can be obtained from R package ( http : / / cancer / cga / phial / ) 3, 4.


Some of these methods can be used in the interpretation of individual genomic data and in the retrospective study for the analysis of genomic data. Hope to give you some hints.

1.    Van Allen EM, Wagle N and Levy MA. Clinicalanalysis and interpretation of cancer genome data. Journal of clinical oncology: official journal of the American Society of Clinical Oncology. 2013;31(15):1825-1833.

2.    Yuan Y, Van Allen EM, Omberg L, Wagle N,Amin-Mansour A, Sokolov A, Byers LA, Xu Y, Hess KR, Diao L, Han L, Huang X,Lawrence MS, Weinstein JN, Stuart JM, Mills GB, et al. Assessing the clinicalutility of cancer genomic and proteomic data across tumor types. Naturebiotechnology. 2014; 32(7):644-652.

3.    Van Allen EM, Wagle N, Stojanov P, PerrinDL, Cibulskis K, Marlow S, Jane-Valbuena J, Friedrich DC, Kryukov G, Carter SL,McKenna A, Sivachenko A, Rosenberg M, Kiezun A, Voet D, Lawrence M, et al.Whole-exome sequencing and clinical interpretation of formalin-fixed,paraffin-embedded tumor samples to guide precision cancer medicine. Naturemedicine. 2014; 20(6):682-688.

4.    Gagan J and Van Allen EM. Next-generationsequencing to guide cancer therapy. Genome medicine. 2015; 7(1):80.

Share to: