文章/答案/技术大牛

发布

社区首页 >问答首页 >用Python提取Fasta Moonlight蛋白序列

问用Python提取Fasta Moonlight蛋白序列
EN

Stack Overflow用户

提问于 2016-09-21 02:18:28

回答 1查看 158关注 0票数 0

我想通过Python从兼职蛋白质数据库( www.moonlightingproteins.org/results.php?search_text= )中提取含有氨基酸序列的FASTA文件，因为这是一个迭代过程，我宁愿学习如何编程而不是手动操作，b/c来吧，我们在2016年了。问题是我不知道如何写代码，因为我是一个新手程序员：基本的伪代码是：

 for protein_name in site: www.moonlightingproteins.org/results.php?search_text=:

       go to the uniprot option 

       download the fasta file 

       store it in a .txt file inside a given folder

提前感谢！

python

database

data-mining

bioinformatics

protein-database

回答 1

Stack Overflow用户

发布于 2016-09-21 05:17:05

我强烈建议向作者索要数据库。从FAQ

我想在一个项目中使用MoonProt数据库，利用生物信息学来分析氨基酸序列或结构。

如果您有兴趣使用MoonProt数据库分析兼职蛋白质的序列和/或结构，请与我们联系，网址为bioinformatics@moonlightingproteins.org。

假设你发现了一些有趣的东西，你将如何在你的论文或论文中引用它？“这些序列是在未经作者同意的情况下从一个公共网页上刮下来的。”更好的做法是将功劳归功于最初的研究人员。

这是对scraping的很好的介绍

回到你原来的问题。

import requests
from lxml import html
#let's download one protein at a time, change 3 to any other number
page = requests.get('http://www.moonlightingproteins.org/detail.php?id=3')
#convert the html document to something we can parse in Python
tree = html.fromstring(page.content)
#get all table cells
cells = tree.xpath('//td')

for i, cell in enumerate(cells):
    if cell.text:
        #if we get something which looks like a FASTA sequence, print it
        if cell.text.startswith('>'):
            print(cell.text)
    #if we find a table cell which has UniProt in it
    #let's print the link from the next cell
    if 'UniProt' in cell.text_content():
        if cells[i + 1].find('a') is not None and 'href' in cells[i + 1].find('a').attrib:
            print(cells[i + 1].find('a').attrib['href'])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39601114

复制

相似问题

问用Python提取Fasta Moonlight蛋白序列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python提取Fasta Moonlight蛋白序列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python提取Fasta Moonlight蛋白序列
EN