我目前正在做一个项目,我需要从PubMed下载几千条引文。我目前正在使用BioPython并编写了以下代码:
from Bio import Entrez
from Bio import Medline
from pandas import *
from sys import argv
import os
Entrez.email = "my_email"
df = read_csv("my_file_path")
i=0
for index, row in df.iterrows():
print (row.id)
handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=row.id)
records = Medline.parse(handle)
for record in records:
try:
abstract = str(record["AB"])
except:
abstract = "none"
try:
title = str(record["TI"])
except:
title = "none"
try:
mesh = str(record["MH"])
except:
mesh = "none"
path = 'my_file_path'
filename= str(row.id) + '.txt'
filename = os.path.join(path, filename)
file = open(filename, "w")
output = "title: "+str(title) + "\n\n" + "abstract: "+str(abstract) + "\n\n" + "mesh: "+str(mesh) + "\n\n"
file.write(output)
file.close()
print (i)
i=i+1但是,当运行此代码时,我会收到以下错误:
Traceback (most recent call last):
File "my_file_path", line 13, in <module>
handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=row.id)
File "/.../anaconda/lib/python3.5/site-packages/biopython-1.68-py3.5-macosx-10.6-x86_64.egg/Bio/Entrez/__init__.py", line 176, in efetch
if ids.count(",") >= 200:
AttributeError: 'numpy.int64' object has no attribute 'count'下面是CSV文件的前几列:
id
10029645
10073846
10078088
10080457
10088066
...发布于 2016-12-04 23:06:59
你的错误在
handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=row.id)从文件中
id UID名单。单个UID或以逗号分隔的UID列表。
从我看到的例子,id是一个字符串,而不是numpy.int64出的一只熊猫。您应该将该row.id转换为字符串。
https://stackoverflow.com/questions/40964681
复制相似问题