首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >rentrez错误HTTP失败: 400当下载超过1/3的记录时

rentrez错误HTTP失败: 400当下载超过1/3的记录时
EN

Stack Overflow用户
提问于 2018-07-16 21:18:34
回答 1查看 333关注 0票数 0

我有个奇怪的情况。我正在使用PubMed挖掘rentrez数据。当我运行entrez_search(),然后运行entrez_summary(),然后运行entrez_fetch(),就会得到这个错误消息(post底部的完整代码):

代码语言:javascript
复制
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>

在四处搜索之后,我想我已经在这一讨论中找到了查询大小的解决方案。当我将retmax_set从500降到10时,代码起作用了。然后,我迭代地确定了不会抛出错误的最大retmax_set值,并发现了在我看来非常奇怪的行为。

term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"的搜索记录有552条。使用不同的retmax值运行我的代码时

  • 设置retmax_set <= 183工作
  • 设置retmax_set >= 184将给出上述错误

修改后的term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"搜索记录有186个。使用不同的retmax值运行此搜索时

  • 设置retmax_set <= 61作品
  • 设置retmax_set >= 62将给出上述错误

搜索term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"记录了395条记录(出于某种原因,PubMed在2017年和2018年公布了29条记录)。使用不同的retmax值在此搜索项上运行代码时

  • 设置retmax_set <= 131作品
  • 设置retmax_set >= 132将给出上述错误

有趣的是,当retmax值大于记录总数的三分之一时,所有三个搜索都开始失败(552 /3= 184,186 /3= 62,395 /3= 131.67)。我将修改我的代码,根据entrez_search返回的结果数来计算entrez_search,但我不知道为什么rentrez或NCBI会这样做。有什么想法吗?

代码语言:javascript
复制
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 182
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 183
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 184
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 185
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-08 17:10:49

结果表明,rentrez使用0-基数计数。因此,552条记录对应于0到551的retstart值。因为我的代码是查找值1到552,所以它错过了第一个记录(#0),然后在查找不存在的记录#552时抛出一个错误。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51370248

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档