首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Pubmed将文章的细节取到daframe

Pubmed将文章的细节取到daframe
EN

Stack Overflow用户
提问于 2022-04-25 22:37:36
回答 1查看 129关注 0票数 1

这是密码。

代码语言:javascript
复制
import pandas as pd
from pymed import PubMed
import numpy as np
pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")


## PUT YOUR SEARCH TERM HERE ##
search_term = 'Charlie Brown'
results = pubmed.query(search_term, max_results=100000)
articleList = []
articleInfo = []

for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
    articleDict = article.toDict()
    articleList.append(articleDict)

# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
    pubmedId = article['pubmed_id'].partition('\n')[0]
    # Append article info to dictionary 
    articleInfo.append({u'pubmed_id':pubmedId,
                       u'publication_date':article['publication_date'], 
                       u'authors':article['authors']})

df=pd.json_normalize(articleInfo)

运行此代码将获取三列: pubmed_id、publication_date和authors

是否有一种方法可以取消作者列并保留另外两列?提前谢了。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-04-26 04:35:28

如果你想要解除套牢,你必须定义一些策略。例如,您可以使用lastname, firstname将每个作者拆分为;来加入作者。

代码语言:javascript
复制
# New column to easily identify how many authors there are in the paper
df['n_authors'] = df['authors'].map(len)

# Unnest authors into a single string using the above-mentioned strategy
df['authors'] = df['authors'].map(lambda authors: ';'.join([f"{author['lastname']}, {author['firstname']}" for author in authors]))

输出:

代码语言:javascript
复制
   pubmed_id publication_date                                            authors  n_authors  
0   35435469       2022-04-19  Easwaran, Raju;Khan, Moin;Sancheti, Parag;Shya...         41  
1   34480858       2021-09-05  Flaxman, Amy;Marchevsky, Natalie G;Jenkin, Dan...         38  
2   30857579       2019-03-13                                     Brown, Charlie          1  
3   28640023       2017-06-24  Thornton, Kevin C;Schwarz, Jennifer J;Gross, A...         12  
4   24195874       2013-11-08  Bicket, Mark C;Gupta, Anita;Brown, Charlie H;C...          4  
5   21741796       2011-07-12  Bird, Jonathan H;Carmont, Michael R;Dhillon, M...          7  
6   21324873       2011-02-18  Cohen, Steven P;Brown, Charlie;Kurihara, Conni...          6  
7   20228712       2010-03-17  Cohen, Steven P;Kapoor, Shruti G;Nguyen, Cuong...          8  
8   20109957       2010-01-30  Cohen, Steven P;Brown, Charlie;Kurihara, Conni...          6  
9   18248779       2008-02-06  Whitaker, Iain S;Duggan, Eileen M;Alloway, Rit...         10  
10  16917639       2006-08-19  Drayton, William;Brown, Charlie;Hillhouse, Karin          3  
11  16282488       2005-11-12  Mao, Hanwen;Lafont, Bernard A P;Igarashi, Tats...          9  
12  14581571       2003-10-29  Moniuszko, Marcin;Brown, Charlie;Pal, Ranajit;...          7  
13  12163382       2002-08-07  Williams, Kenneth;Schwartz, Annette;Corey, Sar...         10 
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72006411

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档