文章/答案/技术大牛

发布

社区首页 >问答首页 >rentrez错误HTTP失败: 400当下载超过1/3的记录时

问rentrez错误HTTP失败: 400当下载超过1/3的记录时
EN

Stack Overflow用户

提问于 2018-07-16 21:18:34

回答 1查看 333关注 0票数 0

我有个奇怪的情况。我正在使用PubMed挖掘rentrez数据。当我运行entrez_search()，然后运行entrez_summary()，然后运行entrez_fetch()，就会得到这个错误消息(post底部的完整代码)：

Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>

在四处搜索之后，我想我已经在这一讨论中找到了查询大小的解决方案。当我将retmax_set从500降到10时，代码起作用了。然后，我迭代地确定了不会抛出错误的最大retmax_set值，并发现了在我看来非常奇怪的行为。

term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"的搜索记录有552条。使用不同的retmax值运行我的代码时

设置retmax_set <= 183工作
设置retmax_set >= 184将给出上述错误

修改后的term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"搜索记录有186个。使用不同的retmax值运行此搜索时

设置retmax_set <= 61作品
设置retmax_set >= 62将给出上述错误

搜索term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"记录了395条记录(出于某种原因，PubMed在2017年和2018年公布了29条记录)。使用不同的retmax值在此搜索项上运行代码时

设置retmax_set <= 131作品
设置retmax_set >= 132将给出上述错误

有趣的是，当retmax值大于记录总数的三分之一时，所有三个搜索都开始失败(552 /3= 184,186 /3= 62,395 /3= 131.67)。我将修改我的代码，根据entrez_search返回的结果数来计算entrez_search，但我不知道为什么rentrez或NCBI会这样做。有什么想法吗？

> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 182
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 183
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 184
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 185
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>

rentrez

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-08 17:10:49

结果表明，rentrez使用0-基数计数。因此，552条记录对应于0到551的retstart值。因为我的代码是查找值1到552，所以它错过了第一个记录(#0)，然后在查找不存在的记录#552时抛出一个错误。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51370248

复制

相似问题

问rentrez错误HTTP失败: 400当下载超过1/3的记录时
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问rentrez错误HTTP失败: 400当下载超过1/3的记录时EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问rentrez错误HTTP失败: 400当下载超过1/3的记录时
EN