系统评估显示,长读长定量在复杂基因位点(>10 isoforms)时比短读更准(Chen et al., 2023)。
四、结果怎么看?单位科普
指标
含义(一句话)
用途
raw counts
落在每个基因的 reads 数,未校正
不能直接比
TPM
每百万转录本数,已校正基因长度和测序深度
同一基因在不同样本比
FPKM/RPKM
早期单位,现在基本被 TPM 取代
别用了,会踩坑
差异分析工具(DESeq2、edgeR)只吃 raw counts,它们自己会做归一化(Love et al., 2014)。
五、常见坑位提醒
1. 参考基因组/转录本版本要一致
GENCODE v39 vs vM28 混用会导致定量天差地别(Frankish et al., 2021)。
2. 重复序列多的基因(例如 SNORD 簇)
短读几乎无法区分,长读或靶向捕获更靠谱(Zhang et al., 2020)。
3. 单端 vs 双端
Salmon/Kallisto 都能吃单端,但差异分析时记得在 colData 里标注(Smith, 2019)。
4. ERCC spike-in
如果实验加了外部 RNA 标准品,可用它来评估定量线性范围(Jiang et al., 2011)。
六、思维导图总结
七、参考文献
• Anders, S. (2015). HTSeq documentation: counting reads in features. HTSeq GitHub.
• Bray, N. L. et al. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34, 525–527.
• Chen, Y. et al. (2023). Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nature Methods.
• Dillies, M. A. et al. (2013). A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics, 14, 671–683.
• Frankish, A. et al. (2021). GENCODE 2021. Nucleic Acids Research, 49(D1), D916–D923.
• Jain, M. et al. (2018). Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology, 36, 338–345.
• Jiang, L. et al. (2011). Synthetic spike-in standards for RNA-seq experiments. Genome Research, 21, 1543–1551.
• Li, B. & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12, 323.
• Love, M. I. et al. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.
• Patro, R. et al. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14, 417–419.
• Prjibelski, A. D. et al. (2023). Accurate isoform discovery with IsoQuant using long reads. Nature Biotechnology.
• Smith, T. (2019). Salmon vs. Kallisto: a quick guide for beginners. Biostars Blog.
• Zhang, Y. et al. (2020). Model-based analysis of ChIP-Seq (MACS). Genome Biology, 9, R137.