我想知道为什么我的get_http_status函数不止一次迭代,导致异常
我有一个数据框架,比如:
> str(df5)
'data.frame': 10 obs. of 3 variables:
$ text : chr "\n" "\n" "\n" "\n" ...
$ enlace: chr "//www.blogger.com| __truncated__ ...
$ Freq : int 1 1 1 1 1 1 1 1 1 r code here我试图使用以下函数获取每个"enlace“的http状态代码:
get_http_status <- function(url){
if (!is.null(url)){
Sys.sleep(3)
print(url)
ret <- HEAD(url)
return(ret$status_code)
}
return("")
}
df44 <- mutate(df5, status = get_http_status(enlace))但不停地说出错误:
** Error in parse_url(url) : length(url) == 1 is not TRUE**我可以用try/catch来扭曲这个函数,它可以工作,但是我不知道为什么会首先发生错误。
get_http_status_2 <- function(url){
tryCatch(
expr = {
Sys.sleep(3)
print(url)
ret <- HEAD(url)
return(ret$status_code)
},
error = function(e){
return("")
}
)
}df5$enlace的内容是:
> df5$enlace
[1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget§ionId=footer-3"
[2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget§ionId=sidebar-right-1"
[3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget§ionId=sidebar-right-1"
[4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget§ionId=sidebar-right-1"
[5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget§ionId=crosscol"
[6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget§ionId=sidebar-right-1"
[7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget§ionId=sidebar-right-1"
[8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"
[9] "http://advancedwindowsdebugging.com/ch06.pdf"
[10] "http://beej.us/guide/我认为它会重复一次,因为函数的结果是:
> df44 <- mutate(df5, status = get_http_status(enlace))
[1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget§ionId=footer-3"
[2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget§ionId=sidebar-right-1"
[3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget§ionId=sidebar-right-1"
[4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget§ionId=sidebar-right-1"
[5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget§ionId=crosscol"
[6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget§ionId=sidebar-right-1"
[7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget§ionId=sidebar-right-1"
[8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"
[9] "http://advancedwindowsdebugging.com/ch06.pdf"
[10] "http://beej.us/guide/bgc/"
Error in parse_url(url) : length(url) == 1 is not TRUE 发布于 2019-12-30 17:07:30
由于您的函数包含一个没有向量化的函数,所以使用高阶函数的apply系列来迭代您的向量。
下面,将对get_http_status的每个元素调用df$enlace。
对于每个调用,需要一个字符向量作为返回,character(1)
vapply(df5$enlace, get_http_status, character(1)) https://stackoverflow.com/questions/59534287
复制相似问题