我有一堆蛋白质,来自一种叫做蛋白质网的东西。现在那里的序列有某种ID,但它显然不是PDB id,所以我需要用其他方法找到它。对于每种蛋白质,我都有其氨基酸序列。我正在使用biopython,但我还不是很有经验,在指南中找不到它。
所以我的问题是,如果我有蛋白质的氨基酸序列,我如何找到蛋白质PDB id?(这样我就可以下载蛋白质的PDB文件)
发布于 2021-03-16 02:35:31
嗨,我之前玩过RCSB PDB搜索API,
以这段代码结束(在rcsb pdb网站上再也找不到示例了),
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 27 16:20:43 2020
@author: Pietro
"""
import PDB_searchAPI_5
from PDB_searchAPI_5.rest import ApiException
import json
#"value":"STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
# Defining the host is optional and defaults to https://search.rcsb.org/rcsbsearch/v1
# See configuration.py for a list of all supported configuration parameters.
configuration = PDB_searchAPI_5.Configuration(
host = "http://search.rcsb.org/rcsbsearch/v1"
)
data_entry_1 = '''{
"query": {
"type": "terminal",
"service": "sequence",
"parameters": {
"evalue_cutoff": 1,
"identity_cutoff": 0.9,
"target": "pdb_protein_sequence",
"value": "STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
}
},
"request_options": {
"scoring_strategy": "sequence"
},
"return_type": "entry"
}'''
# Enter a context with an instance of the API client
with PDB_searchAPI_5.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = PDB_searchAPI_5.SearchServiceApi(api_client)
try:
# Get RCSB PDB data schema as JSON schema extended with RCSB metadata.
pippo = api_instance.run_json_queries_get(data_entry_1)
except ApiException as e:
print("Exception when calling SearchServiceApi->run_json_queries_get: %s\n" % e)
exit()
print(type(pippo))
print(dir(pippo))
pippox = pippo.__dict__
print('\n bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb \n' ,pippox)
print('\n\n ********************************* \n\n')
print(type(pippox))
pippoy = pippo.result_set
print(type(pippoy))
for i in pippoy:
print('\n',i,'\n', type(i))
print('\n LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL\n')
for i in pippoy:
for key in i:
print('\n', i['identifier'], ' score : ', i['score'])搜索模块(import PDB_searchAPI_5)是用以下命令生成的: openapi-generator-cli-4.3.1.jar link here
开放应用编程接口规范1.7.3现在是1.7.15请参阅https://search.rcsb.org/openapi.json
data_entry_1位是从rcsb pdb网站复制的,但再也找不到它了,它说的是mmseqs2是执行搜索的软件,玩:
"evalue_cutoff": 1,
"identity_cutoff": 0.9, 参数,但没有找到仅选择100%身份的方法
here the PDB_searchAPI_5 使用以下命令将其安装在虚拟环境中:
pip install PDB-searchAPI-5-1.0.0.tar.gz
是由openapi-generator-cli-4.3.1.jar生成的,包含:
java -jar openapi-generator-cli-4.3.1.jar generate -g python -i pdb-search-api-openapi.json --additionalproperties=generateSourceCodeOnly=True,packageName=PDB_searchAPI_5
不要在--附加属性部分中放空格(花了一周的时间才弄清楚)
README.md文件是最重要的部分,因为它解释了如何使用OPEN-API客户端。
您需要在此处使用fasta序列:"value":"STEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMSCKCVLS"
分数=1应该完全匹配,
可能biopython blast模块更简单,但是它blast NIH数据库而不是RCSB PDB,对不起不能详细说明这一点,仍然需要弄清楚什么是JSON文件,并且找不到更好的免费工具来自动生成更好的开放API python客户端(我相信这不是一件容易的任务……但我们总是想要更多...)
要获取API文档,请尝试:
java -jar openapi-generator-cli-4.3.1.jar generate -g html -i https://search.rcsb.org/openapi.json --skip-validate-spec
你得到的是html文档或者pdf:https://mrin9.github.io/RapiPdf/
http://search.rcsb.org/openapi.json与https://search.rcsb.org/openapi.json一样有效,因此您可以使用wireshark查看客户端和服务器之间的交换
https://stackoverflow.com/questions/66590533
复制相似问题