文章/答案/技术大牛

发布

社区首页 >问答首页 >如何跳过包含太多搜索结果的标题(或者从Scopus检索信息花费的时间太长)？

问如何跳过包含太多搜索结果的标题(或者从Scopus检索信息花费的时间太长)？
EN

Stack Overflow用户

提问于 2021-12-28 14:04:14

回答 1查看 75关注 0票数 0

我想访问excel，并获得保存在ScopusSearch电子表格中的1400篇文章标题列表的EID。我试图通过以下代码检索EID：

import numpy as np
import pandas as pd
from pybliometrics.scopus import ScopusSearch
nan = pd.read_excel(r'C:\Users\Apples\Desktop\test\titles_nan.xlsx', sheet_name='nan')
error_index = {}

for i in range(0,len(nan)):
   scopus_title = nan.loc[i ,'Title']
   s = ScopusSearch('TITLE("{0}")'.format(scopus_title))
   print('TITLE("{0}")'.format(scopus_title))
   try:
      s = ScopusSearch(scopus_title)
      nan.at[i,'EID'] = s.results[0].eid
      print(str(i) + ' ' + s.results[0].eid)
   except:
      nan.loc[i,'EID'] = np.nan
      error_index[i] = scopus_title
      print(str(i) + 'error' )

但是，我无法检索超过100个标题的EID(大约)，因为某些标题会产生太多的搜索，从而阻碍了整个过程。

因此，我想跳过包含太多搜索的标题，然后转到下一个标题，同时保留跳过的标题的记录。

我刚开始使用Python，所以我不知道该如何做。我想到的顺序如下：

·如果标题产生1项搜索，则检索EID并将其记录在文件nan的“EID”列下。

·如果标题产生1次以上的搜索，将标题记录在错误索引中，打印“太多搜索”，然后转到下一次搜索。

如果标题不提供任何搜索，请将标题记录在错误索引中，打印“error”，然后继续进行下一次搜索。

Attempt 1
for i in range(0,len(nan)):
   scopus_title = nan.at[i ,'Title']
   print('TITLE("{0}")'.format(scopus_title))
s = ScopusSearch('TITLE("{0}")'.format(scopus_title))
print(type(s))

if(s.count()== 1):
    nan.at[i,"EID"] = s.results[0].eid
    print(str(i) + "   " + s.results[0].eid)
elif(s.count()>1):
    continue
    print(str(i) + "  " + "Too many searches")
else:
    error_index[i] = scopus_title
    print(str(i) + "error")

Attempt 2
for i in range(0,len(nan)):
    scopus_title = nan.at[i ,'Title']<br/>
    print('TITLE("{0}")'.format(scopus_title))<br/>
    s = ScopusSearch('TITLE("{0}")'.format(scopus_title))
    if len(s.results)== 1:
        nan.at[i,"EID"] = s.results[0].eid
        print(str(i) + "   " + s.results[0].eid)
    elif len(s.results)>1:  
        continue
        print(str(i) + "  " + "Too many searches")
    else:
        continue
        print(str(i) + "  " + "Error")

我遇到错误，说明'ScopusSearch‘类型的对象没有len()、/count()或搜索，或者没有列表本身。我无法从这里开始。此外，我不确定这是否是正确的方法-跳过标题基于太多的搜索。是否有更有效的方法(例如超时-在搜索过程中花费一定时间后跳过标题)。

如果能在这个问题上提供任何帮助，我们将不胜感激。谢谢!

scopus

pybliometrics

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-12-28 14:57:51

.get_results_size()与download=False相结合

from pybliometrics.scopus import ScopusSearch

scopus_title = "Editorial"
q = f'TITLE("{scopus_title}")'  # this is f-string notation, btw
s = ScopusSearch(q, download=False)
s.get_results_size()
# 243142

如果这个数字低于某一阈值，只需执行s = ScopusSearch(q)并按“尝试2”中的方式进行：

for i, row in nan.iterrows():
    q = f'TITLE("{row['Title']}")'
    print(q)
    s = ScopusSearch(q, download=False)
    n = s.get_results_size()
    if n == 1:
        s = ScopusSearch(q)
        nan.at[i,"EID"] = s.results[0].eid
        print(f"{i} s.results[0].eid")
    elif n > 1:
        print(f"{i} Too many results")
        continue  # must come last
    else:
        print(f"{i} Error")
        continue  # must come last

(我在这里使用.iterrows()来去除指数化。但是，如果索引不是一个范围序列，则i将是不正确的--在本例中，将全部包含在enumerate()中。)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70508153

复制

相似问题

问如何跳过包含太多搜索结果的标题(或者从Scopus检索信息花费的时间太长)？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何跳过包含太多搜索结果的标题(或者从Scopus检索信息花费的时间太长)？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何跳过包含太多搜索结果的标题(或者从Scopus检索信息花费的时间太长)？
EN