br A br acid hydrophobic br Mutant
A
acid
hydrophobic
Mutant amino
polar
charged
Position
0.75
gwCHASMplus
B
activity
2
lowess fit
phosphatase
lipid
−4
C
D
lowess fit
abundance
1.2
protein
0.4
gwCHASMplus
Protein
***
Phosphataseactivity
gwCHASMplus
spearman rho
E
activity
phosphatase
lipid
−4
missense
intermediate
truncating
rare
common
all
other
driver missense mutations
F
abundance
protein
missense
intermediate
truncating
rare
common
all
other
driver missense mutations
G
specificity
0.6
ParsSNP CanDrA
gwCHASMplus
ParsSNP
metrics
sensitivity
precision
CanDrAgwCHASMplus
Figure 5. CHASMplus Predictions Correlate with Multiplexed Functional Assays in PTEN
See also Table S6.
(A) Heatmap displaying gene-weighted CHASMplus scores (gwCHASMplus) of PTEN missense mutations.
(D) Both gwCHASMplus correlations had a larger absolute value than the correlation between the two experiments (Spearman rho = 0.351). *** = p < 0.001.
(legend continued on next page)
r = 0.35), suggestive that it KRN 7000 may be capturing more than one mode of damage in PTEN (Figure 5D).
Next, we examined the lipid phosphatase activity and protein abundance for the PTEN mutations that we predicted as drivers in the TCGA. We observed that these driver missense muta-tions, regardless of frequency, had significantly lower lipid phosphatase activity than other missense mutations in PTEN (common: p = 0.008; intermediate: p = 1.9e 9; rare: p = 1.6e 18; Mann-Whitney U test; Figure 5E). Moreover, when we compared the median lipid phosphatase activity of rare, in-termediate, and common driver missense mutations, we saw a trend toward lower lipid phosphatase activity. Truncating muta-tions had the lowest activity (p = 1.6e 112; Mann-Whitney U test). A likely explanation is that greater decreases in PTEN lipid phosphatase activity may promote tumor growth, and tumor clones with these mutations are positively selected in many patients. This would result in more damaging PTEN vari-ants being more frequently observed. In contrast, we did not find a similar pattern associating increased frequency with lower PTEN protein abundance. These results are consistent with reduced lipid phosphatase activity being more correlated with variants labeled as pathogenic in ClinVar than protein abundance (Figure S6).
Next, we considered whether ParsSNP and CanDrA (2nd and 3rd ranked in pan-cancer benchmarks; Figure 2E) could predict the lipid phosphatase activity of PTEN. Compared to these methods, CHASMplus demonstrated substantially higher spec-ificity (specificity = 0.89 versus 0.42 and 0.0, ParsSNP and CanDrA, respectively), and the difference is statistically signifi-cant (p < 2.2e 16; McNemar’s test; Figure 5G). Specificity is defined as the proportion of negative examples, in this case, a lack of damage to lipid phosphatase activity, that is correctly predicted. Next, we applied the metrics of sensitivity, precision, and F1 score (Figure 5G). CHASMplus had the best balance of sensitivity and precision (highest F1 score) and the best preci-sion. The other two methods have higher sensitivity at identifying damaging mutations but at the cost of poor specificity. We attri-bute the better performance of CHASMplus to our control of errors using the statistically rigorous false discovery rate, which is not used by the other methods.
The Trajectory of Driver Discovery across Diverse Cancer Types
We next sought to understand whether cancer types fundamen-tally differed in their usage of driver missense mutations. Based on our cancer-type-specific models from CHASMplus, we found that the diversity and prevalence of driver missense mutations varied considerably across TCGA cancer types (Figure 6A; STAR Methods). Using K-means clustering, we found that cancer types could be grouped into high diversity and low prev-alence (12 cancer types), high diversity and high prevalence (15 cancer types), and low diversity and high prevalence (5 cancer
types). These differences were not associated with intra-tumor heterogeneity or normal contamination, as assessed by mean variant allele fraction (VAF) of a cancer type (p > 0.05; correlation test; STAR Methods). Nor were they associated with TCGA sample size for a particular cancer type. For example, while both pancreatic ductal adenocarcinoma (PAAD) and sarcoma (SARC) had similar sample sizes (n = 155 and 204, respectively), PAAD had high prevalence and low diversity, while SARC had low prevalence and high diversity.