

分析维度 | 核心思路 | 技术特点 |
|---|---|---|
DNA甲基化 | 检测DNA分子上特定的甲基化修饰(5mC)模式。癌细胞会表现出全基因组的低甲基化和特定区域(如抑癌基因启动子)的高甲基化。 | 直接反映基因的表达调控,是当前最主流的技术路径之一。通过检测血浆中循环肿瘤DNA(ctDNA)的特异性甲基化信号,可实现癌症的早期筛查和组织溯源定位。 |
片段组学 | 分析血液中游离DNA(cfDNA)的片段大小、末端序列、核小体保护模式等特征。癌来源的ctDNA与正常细胞释放的DNA在这些特征上存在差异。 | 是一种间接的分析方法,不直接检测序列变异。它能提供与甲基化互补的信息,可与甲基化等多组学数据整合,构建更精准的AI模型。 |
基因突变 | 检测与癌症驱动相关的基因序列改变,例如单核苷酸变异(SNV)、插入/缺失(Indel)等。 | 是肿瘤基因检测的经典方法。但在早筛中,因早期肿瘤ctDNA含量极低,突变信号较弱,其灵敏度通常低于甲基化,更多作为补充信息 |


Algorithm | Input features | Machine learning algorithm | Cancers analyzed | Sequencing depth required for robustness | Performance summary |
|---|---|---|---|---|---|
DNA evaluation of fragments for early interception (DELFI) | Fragment size ratios and distributions in 5 Mb genomic bins | Gradient boosting | Breast, colorectal, lung, ovarian, pancreatic, gastric, bile duct | WGS 0.1× coverage | 57%–99% sensitivity at 98% specificity; overall AUC = 0.94 Tissue-of-origin prediction: 75% accuracy |
Genome-wide analysis of fragment ends (GALYFRE) | Information-weighted fraction of aberrant fragments (iwFAF) which parametrizes overall aberrancy of fragment end positions relative to previously defined recurrently protected regions; nucleotide frequencies 10 bp either side of fragment ends | Random forest | Breast, cholangiocarcinoma, glioblastoma, melanoma | 1 million reads per sample (WGS ∼0.05× coverage) | 45.5%–94.3% sensitivity at 95% specificity; overall AUC = 0.91; stage I AUC = 0.87 |
Examination of cfDNA with end selection (EXCEL) | Discordance metric (N-index): parametrizes difference between cfDNA-inferred and expected hematopoietic nucleosome positions in 5 Mb genomic bins | Gradient boosting | Bile duct, breast, colorectal, lung, ovarian, pancreatic | Not shown | 78.4%–100% sensitivity at 95% specificity across all cancer types and stages; overall AUC = 0.95 |
Instruction-tuned large language model for the assessment of cancer (iLLMAC) | Sentence containing tokens of 5′ 4-mer end motifs ordered by frequency; end motif “founder profiles” calculated by non-negative matrix factorization; motif diversity score | Large language model (LLaMA, 7B parameters) | Liver (hepatocellular), cervical, colorectal, esophageal, ovarian, head and neck, lung | 0.1 million reads per sample | 100% sensitivity at 66.7% specificity; overall AUC = 0.912 |
End motif inspection via transformer (EMIT) | Sentence containing tokens of 5′ 4-mer end motifs ordered by frequency | Large language model (up to 32 Mb parameter size) | Lung | 10 million reads per sample (WGS ∼0.5× coverage) | AUC = 0.962; identified several end motifs with higher attention scores in cancer |

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。