• 2022-09
  • 2022-08
  • 2022-07
  • 2022-05
  • 2022-04
  • 2021-03
  • 2020-08
  • 2020-07
  • 2018-07
  • br A br acid hydrophobic br Mutant


    acid hydrophobic
    Mutant amino polar
    0.75 gwCHASMplus
    activity 2 lowess fit
    lipid −4
    lowess fit
    abundance 1.2
    protein 0.4
    Protein ***
    spearman rho
    lipid −4
    missense intermediate truncating
    rare common
    all other
    driver missense mutations
    missense intermediate truncating
    rare common
    all other
    driver missense mutations
    specificity 0.6
    ParsSNP CanDrA
    Figure 5. CHASMplus Predictions Correlate with Multiplexed Functional Assays in PTEN
    See also Table S6.
    (A) Heatmap displaying gene-weighted CHASMplus scores (gwCHASMplus) of PTEN missense mutations.
    (D) Both gwCHASMplus correlations had a larger absolute value than the correlation between the two experiments (Spearman rho = 0.351). *** = p < 0.001.
    (legend continued on next page)
    r = 0.35), suggestive that it KRN 7000 may be capturing more than one mode of damage in PTEN (Figure 5D).
    Next, we examined the lipid phosphatase activity and protein abundance for the PTEN mutations that we predicted as drivers in the TCGA. We observed that these driver missense muta-tions, regardless of frequency, had significantly lower lipid phosphatase activity than other missense mutations in PTEN (common: p = 0.008; intermediate: p = 1.9e 9; rare: p = 1.6e 18; Mann-Whitney U test; Figure 5E). Moreover, when we compared the median lipid phosphatase activity of rare, in-termediate, and common driver missense mutations, we saw a trend toward lower lipid phosphatase activity. Truncating muta-tions had the lowest activity (p = 1.6e 112; Mann-Whitney U test). A likely explanation is that greater decreases in PTEN lipid phosphatase activity may promote tumor growth, and tumor clones with these mutations are positively selected in many patients. This would result in more damaging PTEN vari-ants being more frequently observed. In contrast, we did not find a similar pattern associating increased frequency with lower PTEN protein abundance. These results are consistent with reduced lipid phosphatase activity being more correlated with variants labeled as pathogenic in ClinVar than protein abundance (Figure S6).
    Next, we considered whether ParsSNP and CanDrA (2nd and 3rd ranked in pan-cancer benchmarks; Figure 2E) could predict the lipid phosphatase activity of PTEN. Compared to these methods, CHASMplus demonstrated substantially higher spec-ificity (specificity = 0.89 versus 0.42 and 0.0, ParsSNP and CanDrA, respectively), and the difference is statistically signifi-cant (p < 2.2e 16; McNemar’s test; Figure 5G). Specificity is defined as the proportion of negative examples, in this case, a lack of damage to lipid phosphatase activity, that is correctly predicted. Next, we applied the metrics of sensitivity, precision, and F1 score (Figure 5G). CHASMplus had the best balance of sensitivity and precision (highest F1 score) and the best preci-sion. The other two methods have higher sensitivity at identifying damaging mutations but at the cost of poor specificity. We attri-bute the better performance of CHASMplus to our control of errors using the statistically rigorous false discovery rate, which is not used by the other methods.
    The Trajectory of Driver Discovery across Diverse Cancer Types
    We next sought to understand whether cancer types fundamen-tally differed in their usage of driver missense mutations. Based on our cancer-type-specific models from CHASMplus, we found that the diversity and prevalence of driver missense mutations varied considerably across TCGA cancer types (Figure 6A; STAR Methods). Using K-means clustering, we found that cancer types could be grouped into high diversity and low prev-alence (12 cancer types), high diversity and high prevalence (15 cancer types), and low diversity and high prevalence (5 cancer 
    types). These differences were not associated with intra-tumor heterogeneity or normal contamination, as assessed by mean variant allele fraction (VAF) of a cancer type (p > 0.05; correlation test; STAR Methods). Nor were they associated with TCGA sample size for a particular cancer type. For example, while both pancreatic ductal adenocarcinoma (PAAD) and sarcoma (SARC) had similar sample sizes (n = 155 and 204, respectively), PAAD had high prevalence and low diversity, while SARC had low prevalence and high diversity.