文章/答案/技术大牛

发布

社区首页 >问答首页 >基于变异和人类参考的DNA序列构建

问基于变异和人类参考的DNA序列构建
EN

Stack Overflow用户

提问于 2013-09-17 14:15:52

回答 3查看 1.6K关注 0票数 4

1000基因组计划为我们提供了数千人的DNA序列相对于人类参考DNA序列“变异”的信息。变体存储在VCF文件中。

格式。基本上，对于该项目中的每个人，我们可以从VCF文件中获取他/她的DNA变异信息，例如，变异类型(例如插入/删除和SNP )以及变异相对于参考的位置。引用采用FASTA格式。通过将VCF文件中的一个人的变异信息与FASTA文件中的人类参考信息相结合，我想构建该人的DNA序列。

我的问题是:它是否已经存在，有些工具可以很好地执行任务，或者我必须自己编写脚本。

vcf-variant-call-format

bioinformatics

回答 3

Stack Overflow用户

回答已采纳

发布于 2013-09-27 00:23:50

来自VCFtools的perl脚本VCFtools似乎与您正在寻找的内容非常接近：

vcf-consensus  
Apply VCF variants to a fasta file to create consensus sequence.

Usage: cat ref.fa | vcf-consensus [OPTIONS] in.vcf.gz > out.fa
Options:
   -h, -?, --help         This help message.
   -H, --haplotype <int>  Apply only variants for the given haplotype (1,2)
   -s, --sample <name>    If not given, all variants are applied
Examples:
   samtools faidx ref.fa 8:11870-11890 | vcf-consensus in.vcf.gz > out.fa

新的fasta序列从参考fasta和变体调用文件？在“生物之星”上发布的问题的答案也可能有所帮助。

票数 4

Stack Overflow用户

发布于 2015-04-06 21:55:34

可以使用bcftools (https://github.com/samtools/bcftools)执行此任务：

bcftools consensus <file.vcf> \
  --fasta-ref <file> \
  --iupac-codes \
  --output <file> \
  --sample <name>

要安装bcftools，请执行以下操作：

git clone --branch=develop git://github.com/samtools/bcftools.git
git clone --branch=develop git://github.com/samtools/htslib.git
cd htslib && make && cd ..
cd bcftools && make && cd ..
sudo cp bcftools/bcftools /usr/local/bin/

您还可以将bcftools协商一致与samtools (http://www.htslib.org/)组合起来，从fasta文件中提取特定的时间间隔。有关更多信息，请参见bcftools共识：

About:   Create consensus sequence by applying VCF variants to a reference
         fasta file.
Usage:   bcftools consensus [OPTIONS] <file.vcf>
Options:
    -f, --fasta-ref <file>     reference sequence in fasta format
    -H, --haplotype <1|2>      apply variants for the given haplotype
    -i, --iupac-codes          output variants in the form of IUPAC ambiguity codes
    -m, --mask <file>          replace regions with N
    -o, --output <file>        write output to a file [standard output]
    -c, --chain <file>         write a chain file for liftover
    -s, --sample <name>        apply variants of the given sample
Examples:
   # Get the consensus for one region. The fasta header lines are then expected
   # in the form ">chr:from-to".
   samtools faidx ref.fa 8:11870-11890 | bcftools consensus in.vcf.gz > out.fa

票数 3

Stack Overflow用户

发布于 2016-11-25 17:39:09

如果您有一个fasta参考基因组和一个bam文件，您可以使用samtools、bcftools和vcfutils.pl (面向初学者的ps: samtools和bcftools都可以在计算集群或Linux中编译，如果是这样的话，只需在软件名称之前添加每个参考文件的位置；vcfutils已经是bcftools的perl脚本)。

samtools mpileup -d8000 -q 20 -Q 10 -uf  REFERENCE.fasta Your_File.bam | bcftools call -c | vcfutils.pl vcf2fq  > OUTPUT.fastq

d，-max- == -q，-min用于对齐使用== -Q的最小映射质量，-min-BQ最小基质量用于要被视为==的基座(当然，可以使用不同的值，请参见http://www.htslib.org/doc/samtools.html)

它生成一种奇怪的格式，看起来像fastq，但不是，所以不能使用转换器来转换它，但是可以使用下面的sed命令，这是我为这个输出编写的专用命令：

sed -i -e '/^+/,/^\@/{/^+/!{/^\@/!d}}; /^+/ d; s/@/>/g' OUTPUT.fastq

最后，确保将您的新fasta文件与您的参考文件进行比较，以确保一切正常。

编辑时要小心使用SED命令，IT可能会删除与我不同的质量评分情况下的一些读取。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/18852334

复制

相似问题

问基于变异和人类参考的DNA序列构建
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于变异和人类参考的DNA序列构建EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于变异和人类参考的DNA序列构建
EN