文章/答案/技术大牛

发布

问修改genbank功能的位置
EN

Stack Overflow用户

提问于 2014-07-08 16:04:51

回答 3查看 1K关注 0票数 2

编辑：我知道feature.type会给出基因/CDS，feature.qualifiers会给出“db_xref”/“locus_tag”/“推论”等等。有feature.对象可以让我直接访问这个位置(例如：[5240:7267](+) )吗？

这个URL提供了一些更多的信息，虽然我不知道如何将它用于我的目的.operator

原文：

我正在尝试修改GenBank文件中特性的位置。本质上，我想修改GenBank文件的以下部分：

 gene            5240..7267
                 /db_xref="GeneID:887081"
                 /locus_tag="Rv0005"
                 /gene="gyrB"
 CDS             5240..7267
                 /locus_tag="Rv0005"
                 /inference="protein motif:PROSITE:PS00177"
                 ...........................

至

 gene            5357..7267
                 /db_xref="GeneID:887081"
                 /locus_tag="Rv0005"
                 /gene="gyrB"
 CDS             5357..7267
                 /locus_tag="Rv0005"
                 /inference="protein motif:PROSITE:PS00177"
                 .............................

注意从5240到5357的更改

到目前为止，从浏览互联网和Stackoverflow到现在，我有：

from Bio import SeqIO
gb_file = "mtbtomod.gb"
gb_record = SeqIO.parse(open(gb_file, "r+"), "genbank")
rvnumber = 'Rv0005'
newstart = 5357

final_features = []

for record in gb_record:
  for feature in record.features:
    if feature.type == "gene":
        if feature.qualifiers["locus_tag"][0] == rvnumber:
            if feature.location.strand == 1:
                feature.qualifiers["amend_position"] = "%s:%s" % (newstart, feature.location.end+1)
            else:
                # do the reverse for the complementary strand
    final_features.append(feature)
  record.features = final_features
  with open("testest.gb","w") as testest:
    SeqIO.write(record, testest, "genbank")

这基本上创建了一个名为“amend_position”的新限定符。但是，我想做的是直接修改的位置(不管有没有创建一个新文件.)

Rv0005只是我需要更新的locus_tag的一个例子。我还有大约600个地点需要更新，这就解释了为什么需要一个脚本。帮助!

python

biopython

genbank

回答 3

Stack Overflow用户

回答已采纳

发布于 2014-07-11 18:03:54

好的，我有件事现在完全可以用了。我会发布代码以防有人需要类似的东西

__author__ = 'Kavin'

from Bio import SeqIO
from Bio import SeqFeature
import xlrd
import sys
import re

workbook = xlrd.open_workbook(sys.argv[2])
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]

# Create dicts to store TSS data
TSS = {}
row = {}
# For each entry (row), store the startcodon and strand information
for i in range(2, sheet.nrows - 1):
    if data[i][5] < -0.7:   # Ensures BASS score is within significant range
        Gene = data[i][0]
        row['Direction'] = str(data[i][3])
        row['StartCodon'] = int(data[i][4])
        TSS[str(Gene)] = row
        row = {}
    else:
        i += 1

# Create an output filename based on input filename
outfile_init = re.search('(.*)\.(\w*)', sys.argv[1])
outfile = str(outfile_init.group(1)) + '_modified.' + str(outfile_init.group(2))

final_features = []
for record in SeqIO.parse(open(sys.argv[1], "r"), "genbank"):
    for feature in record.features:
        if feature.type == "gene" or feature.type == "CDS":
            if TSS.has_key(feature.qualifiers["locus_tag"][0]):
                newstart = TSS[feature.qualifiers["locus_tag"][0]]['StartCodon']
                if feature.location.strand == 1:
                    feature.location = SeqFeature.FeatureLocation(SeqFeature.ExactPosition(newstart - 1),
                                                                  SeqFeature.ExactPosition(
                                                                      feature.location.end.position),
                                                                  feature.location.strand)
                else:
                    feature.location = SeqFeature.FeatureLocation(
                        SeqFeature.ExactPosition(feature.location.start.position),
                        SeqFeature.ExactPosition(newstart), feature.location.strand)
        final_features.append(feature)  # Append final features
    record.features = final_features
    with open(outfile, "w") as new_gb:
        SeqIO.write(record, new_gb, "genbank")

这假定使用(如python program.py <genbankfile> <excel spreadsheet> )

这还假定电子表格的格式如下：

基因同义词Tuberculist_annotated_start取向重新标注的_start BASS_score

Rv0005 gyrB 5240 + 5357 -1.782

Rv0012 Rv0012 14089 + 14134 -1.553

Rv0018c pstP 23181 - 23172 -2.077

Rv0032 bioF2 34295 + 34307 -0.842

Rv0037c Rv0037c 41202 - 41163 -0.554

票数 1

Stack Overflow用户

发布于 2014-07-09 09:02:19

所以，你可以试试下面这样的方法。因为更改的数量将等于文件中发现的CDS/基因的数量。您可以从csv/文本文件中读取位置/位置，并创建一个列表，就像我手动创建的change_values。

import re
f = open("test.txt")
change_values=["1111", "2222"]
flag = True
next = 0
for i in f.readlines():
    if i.startswith(' CDS') or i.startswith(' gene'):
        out = re.sub(r"\d+", str(change_values[next]), i)
                #Instead of print write
        print out
        flag = not flag
        if flag==True:
            next += 1
    else:
                #Instead of print write
        print i

艾米示例test.txt文件如下所示：

 gene            5240..7267
                 /db_xref="GeneID:887081"
                 /locus_tag="Rv0005"
                 /gene="gyrB"
 CDS             5240..7267
                 /locus_tag="Rv0005"
                 /inference="protein motif:PROSITE:PS00177"
                 ...........................

 gene            5240..7267
                 /db_xref="GeneID:887081"
                 /locus_tag="Rv0005"
                 /gene="gyrB"
 CDS             5240..7267
                 /locus_tag="Rv0005"
                 /inference="protein motif:PROSITE:PS00177"
                 ...........................

希望这能解决你的问题。干杯!

票数 0

Stack Overflow用户

发布于 2017-07-18 15:40:03

我认为这可以用本地的biopython synthax来完成，这里不需要正则表达式，只需使用最小的工作示例：

from Bio import SeqIO
from Bio import SeqFeature
import copy

gbk = SeqIO.read('./test_gbk', 'gb')

index = 1

feature_to_change = copy.deepcopy(gbk.features[index]) #depends what feature you want to change, 
#can also be done with loop if you want to change them all, or by some function...

new_start = 0
new_end = 100

new_feature_location = SeqFeature.FeatureLocation(new_start, new_end, feature.location.strand) #create a new feature location object

feature_to_change.location = new_feature_location #change old feature location

del gbk.features[index] #remove changed feature

gkb.features.append(feature_to_change) #add identical feature with new location

gbk.features = sorted(gbk.features, key = lambda feature: feature.location.start) # if you want them sorted by the start of the location, which is the usual case

SeqIO.write(gbk, './test_gbk_with_new_feature', 'gb')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/24636588

复制

相似问题

问修改genbank功能的位置
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问修改genbank功能的位置EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问修改genbank功能的位置
EN