文章/答案/技术大牛

发布

社区首页 >问答首页 >在R中使用for循环进行Web抓取

问在R中使用for循环进行Web抓取
EN

Stack Overflow用户

提问于 2019-02-20 10:31:10

回答 1查看 74关注 0票数 0

我想从this link中抓取数据，我已经用R编写了以下代码来实现这一点。但是，这不起作用，并且只返回结果的第一页。显然，这个循环不起作用。有人知道这个循环出了什么问题吗？

library('rvest')

for (i in 1:40) {

     webpage <- read_html(paste0(("http://search.beaconforfreedom.org/search/censored_publications/result.html?author=&cauthor=&title=&country=7327&language=&censored_year=&censortype=&published_year=&censorreason=&sort=t&page=, i"))

     rank_data_html <- html_nodes(webpage,'tr+ tr td:nth-child(1)')

     rank_data <- html_text(rank_data_html)

     rank_data<-as.numeric(rank_data)

     title_data_html <- html_nodes(webpage,'.censo_list font')

     title_data <- html_text(title_data_html)

     author_data_html <- html_nodes(webpage,'.censo_list+ td font')
     author_data <- html_text(author_data_html)

     country_data_html <- html_nodes(webpage,'.censo_list~ td:nth-child(4) font')

     rcountry_data <- html_text(country_data_html)

     year_data_html <- html_nodes(webpage,'tr+ tr td:nth-child(5) font')

     year_data <- html_text(year_data_html)

     type_data_html <- html_nodes(webpage,'tr+ tr td:nth-child(6) font')

     type_data <- html_text(type_data_html)

}

censorship_df<-data.frame(Rank = rank_data, Title = title_data, Author = author_data, Country = rcountry_data, Type = type_data, Year = year_data)

write.table(censorship_df, file="sample.csv",sep=",",row.names=F)

loops

web-scraping

回答 1

Stack Overflow用户

发布于 2019-02-20 10:43:06

你确定这个循环有什么问题吗？我预计它会40次得到第一页的结果。看

webpage <- read_html(paste0(("http://search.beaconforfreedom.org/search/censored_publications/result.html?author=&cauthor=&title=&country=7327&language=&censored_year=&censortype=&published_year=&censorreason=&sort=t&page=, i"))

这不应该是(字符串最后十个字符的差异；引号移动)

webpage <- read_html(paste0(("http://search.beaconforfreedom.org/search/censored_publications/result.html?author=&cauthor=&title=&country=7327&language=&censored_year=&censortype=&published_year=&censorreason=&sort=t&page=", i))

paste0在R中的作用是将两个字符串缝合在一起，而不需要任何分隔符。但是你只有一个字符串。因此，它尝试为page=, i获取结果。但是您希望它通过page=40获取page=1。因此，将引号放在page=", i中，这样它就可以将URL和i粘贴在一起。

我不是一个R程序员，但这对我来说很简单。

paste0行为的Source。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54777964

复制

相似问题

问在R中使用for循环进行Web抓取
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中使用for循环进行Web抓取EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中使用for循环进行Web抓取
EN