我有个奇怪的情况。我正在使用PubMed挖掘rentrez数据。当我运行entrez_search(),然后运行entrez_summary(),然后运行entrez_fetch(),就会得到这个错误消息(post底部的完整代码):
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>在四处搜索之后,我想我已经在这一讨论中找到了查询大小的解决方案。当我将retmax_set从500降到10时,代码起作用了。然后,我迭代地确定了不会抛出错误的最大retmax_set值,并发现了在我看来非常奇怪的行为。
term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"的搜索记录有552条。使用不同的retmax值运行我的代码时
retmax_set <= 183工作retmax_set >= 184将给出上述错误修改后的term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"搜索记录有186个。使用不同的retmax值运行此搜索时
retmax_set <= 61作品retmax_set >= 62将给出上述错误搜索term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"记录了395条记录(出于某种原因,PubMed在2017年和2018年公布了29条记录)。使用不同的retmax值在此搜索项上运行代码时
retmax_set <= 131作品retmax_set >= 132将给出上述错误有趣的是,当retmax值大于记录总数的三分之一时,所有三个搜索都开始失败(552 /3= 184,186 /3= 62,395 /3= 131.67)。我将修改我的代码,根据entrez_search返回的结果数来计算entrez_search,但我不知道为什么rentrez或NCBI会这样做。有什么想法吗?
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 182
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 183
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 184
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 185
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>发布于 2019-02-08 17:10:49
结果表明,rentrez使用0-基数计数。因此,552条记录对应于0到551的retstart值。因为我的代码是查找值1到552,所以它错过了第一个记录(#0),然后在查找不存在的记录#552时抛出一个错误。
https://stackoverflow.com/questions/51370248
复制相似问题