我正在开发一个函数,用于在线下载蛋白质.pdb文件,作为我正在创建的代码体的一部分,用于停靠由我们的AIBind机器学习模型生成的蛋白质和配体。对于大约60%的这些蛋白质,我可以使用基因库将它们的HGNC I转换为pdb I,然后通过uniprot和RCSB网站查询它们,下载pdb文件。然而,对于其他40%的蛋白质,只有计算生成的α折叠PDB模型,而我一直使用的基因库不承认这些蛋白质具有有效的PDB ID。谢天谢地,在字母折叠网站上有一个搜索功能,通过使用HGNC ID进行搜索,我收到了一个条目列表(其中最上面的条目是我正在寻找的蛋白质的99% ),如下所示;

一旦我有了uniprot (在本例中显示为Q7K0E6),我就可以导航到字母表条目页面,并访问文件服务器来下载该蛋白质的PDB文件,我已经能够成功地对我一直在使用的数据库中具有注册uniprot的蛋白质执行该文件。
我一直在使用下面的代码来抓取作为搜索条目输入的HGNC符号的搜索网页,将所有HTML页面数据放入一个文本文件中。
import urllib
import urllib.request
import requests
url = 'https://alphafold.ebi.ac.uk/search/text/'
fname = 'alphaname.txt'
HGNC = 'vr1'
url = url + 'vr1'
get = urllib.request.urlopen(url)
html = get.read()
r = requests.get(url)
with open(fname, "wb") as f:
f.write(html) 当我在文件本身中执行搜索时(手动的,以及通过python),我没有看到任何查询作为搜索结果的条目的数据。

如何使用python从网站的搜索功能中检索数据?
发布于 2022-04-20 16:04:37
数据通过JavaScript从外部URL加载。您可以使用requests模块来模拟它,例如:
import json
import requests
api_url = "https://alphafold.ebi.ac.uk/api/search"
params = {
"q": "(text:*vr1 OR text:vr1*)",
"type": "main",
"start": "0",
"rows": "20",
}
data = requests.get(api_url, params=params).json()
print(json.dumps(data, indent=4))指纹:
{
"numFound": 112,
"start": 0,
"numFoundExact": true,
"docs": [
{
"entryId": "AF-O35433-F1",
"gene": "Trpv1",
"geneT": [
"Trpv1",
"Vr1",
"Vr1l"
],
"geneSynonyms": [
"Vr1",
"Vr1l"
],
"sequenceChecksum": "DAFC80B12BDF71BF",
"sequenceVersionDate": "1998-01-01",
"uniprotAccession": "O35433",
"uniprotAccessionT": "O35433",
"uniprotId": "TRPV1_RAT",
"uniprotDescription": "Transient receptor potential cation channel subfamily V member 1",
"protein": [
"Transient receptor potential cation channel subfamily V member 1",
"Capsaicin receptor",
"Osm-9-like TRP channel 1",
"Vanilloid receptor 1",
"Vanilloid receptor type 1-like",
"OTRPC1"
],
"taxId": 10116,
"organismScientificName": "Rattus norvegicus",
"organism": [
"Rattus norvegicus",
"Rat"
],
"globalMetricValue": 71.55,
"uniprotStart": 1,
"uniprotEnd": 838,
"uniprotSequence": "MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASPLDCPYEEGGLASCPIITVSSVLTIQRPGDGPASVRPSSQDSVSAGEKPPRLYDRRSIFDAVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDVARKTDSLKQFVNASYTDSYYKGQTALHIAIERRNMTLVTLLVENGADVQAAANGDFFKKTKGRPGFYFGELPLSLAACTNQLAIVKFLLQNSWQPADISARDSVGNTVLHALVEVADNTVDNTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHEPECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLNRLLQDKWDRFVKRIFYFNFFVYCLYMIIFTAAAYYRPVEGLPPYKLKNTVGDYFRVTGEILSVSGGVYFFFRGIQYFLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVASMVFSLAMGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYLVFLFGFSTAVVTLIEDGKNNSLPMESTPHKCRGSACKPGNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAVFIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRKAFRSGKLLQVGFTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLRSGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK",
"modelCreatedDate": "2021-07-01",
"organismCommonNames": [
"Rat"
],
"proteinFullNames": [
"Capsaicin receptor",
"Osm-9-like TRP channel 1",
"Vanilloid receptor 1",
"Vanilloid receptor type 1-like"
],
"proteinShortNames": [
"OTRPC1"
],
"latestVersion": 2,
"allVersions": [
1,
2
],
"_version_": 1723016518349881344
},
{
"entryId": "AF-Q7K0E6-F1",
"gene": "AspRS",
"geneT": [
...https://stackoverflow.com/questions/71941240
复制相似问题