首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >输入读数为5500万次,但仅有100万次用于对齐

输入读数为5500万次,但仅有100万次用于对齐
EN

Stack Overflow用户
提问于 2018-02-27 04:14:12
回答 1查看 71关注 0票数 0

UI使用tophat (v2.1.0)运行此代码,以对齐从我的RNA-seq fastq文件读取(bowtie2 (v2.2.6.0)),使用bowtie2 genomes.bt2 indexes (Homo_sapiens_UCSC_hg19)(/U

代码语言:javascript
复制
tophat2 -p 8 -G /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19 /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome HPDE_S11_L002_R1_001.fastq

UMy fastq文件大约为13 GB。但是,对齐后,我接受的hits文件只有50 MB。/U

UHeres对齐输出显示我有大约5500万个保留的读数:/U

2018-02-21 13:58:33开始TopHat运行(v2.1.0)

代码语言:javascript
复制
[2018-02-21 13:58:33]     Checking for Bowtie
      Bowtie version:    2.2.6.0
[2018-02-21 13:58:33] Checking for Bowtie index files (genome)..
[2018-02-21 13:58:33] Checking for reference FASTA file
[2018-02-21 13:58:33] Generating SAM header for /home/ajsn6c/Desktop /Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
[2018-02-21 13:58:35] Reading known junctions from GTF file
[2018-02-21 13:58:39] Preparing reads
 left reads: min. length=12, max. length=101, 55970267 kept reads (45104 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2018-02-21 14:17:45] Building transcriptome data files Panc1/tmp/genes
[2018-02-21 14:17:59] Building Bowtie index from genes.fa
[2018-02-21 14:32:14] Mapping left_kept_reads to transcriptome genes with Bowtie2 
[2018-02-21 15:38:44] Resuming TopHat pipeline with unmapped reads
[2018-02-21 15:38:44] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2 
[2018-02-21 16:17:07] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2018-02-21 16:18:13] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2018-02-21 16:19:32] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2018-02-21 16:20:46] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2018-02-21 16:21:59] Searching for junctions via segment mapping
[2018-02-21 16:25:24] Retrieving sequences for splices
[2018-02-21 16:27:18] Indexing splices
Building a SMALL index
[2018-02-21 16:27:37] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2018-02-21 16:27:50] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2018-02-21 16:28:03] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2018-02-21 16:28:17] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2018-02-21 16:28:31] Joining segment hits
[2018-02-21 16:31:02] Reporting output tracks

[2018-02-22 19:21:42] A summary of the alignment counts can be found in ./tophat_out/align_summary.txt
[2018-02-22 19:21:42] Run complete: 02:08:37 elapse

UThis是align_summary文件/U中的对齐摘要

代码语言:javascript
复制
reads:
      Input     :    926337
       Mapped   :    898584 (97.0% of input)
        of these:     14621 ( 1.6%) have multiple alignments (14 have >20)

97.0%的总体读取映射率。

为什么输入只有900K,而它保持了5500万次读取?阅读的质量也有很好的phred分数。任何想法都将不胜感激!

谢谢,亚历克斯

EN

回答 1

Stack Overflow用户

发布于 2018-02-27 09:53:54

日志文件中的以下条目很奇怪:

2018-02-21 14:17:45构建转录组数据文件panc1/tmp/gene

2018-02-21 14:17:59从genes.fa构建Bowtie索引

下面是你的tophat2命令(为了提高可读性,我对该命令进行了重新设计)

代码语言:javascript
复制
./tophat2 \
    -p 8 \
    -G /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19 /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf \
    /home/ajsn6c/Desktop/Kumar_RNA-seq/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome \
    HPDE_S11_L002_R1_001.fastq

  1. 似乎有一些错误的空格(例如[...]Homo_sapiens_UCSC_hg19 /Homo_sapiens[...];不确定这是否是问题所在。
  2. 根据您的命令,转录本应基于文件[...]/UCSC/hg19/Sequence/Bowtie2Index/hg19.gtf构建;我不知道Panc1/tmp/genes从何而来,但很明显,此文件用于构建参考转录本,而不是
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48996362

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档