来自以下数据帧URLS
>dput(droplevels(head(URLS, 5)))
URLS <- structure(list(URLS = structure(c(3L, 1L, 4L, 5L, 2L), .Label = c("http://www.example.com/cms/en/product/channel.html?channel=db3a30433580b37101359f8ee6963814#Anker&ic=0590001",
"http://www.example.com/cms/en/product/power/igbt/igbt-discrete/discrete-igbt-with-anti-parallel-diode/600v-and-1200v-trenchstop/channel.html?channel=db3a3043397219b6013977d62aa15462&ic=0590001",
"http://www.example.com/cms/en/product/power/lighting-ics-and-audio-driver-ics/dc-dc-led-driver-ic-and-linear-control-solutions/CDM10V/productType.html?productType=5546d46253f65057015414dc7d576130&ic=0590001",
"http://www.example.com/cms/en/product/promopages/pcim?ic=0590001",
"http://www.example.com/dgdl/example-ApplicationNote_600V_TRENCHSTOP_Performance_IGBT.pdf-AN-v01_00-EN.pdf?fileId=5546d46253f65057015452d6317a71df&ic=0590001"
), class = "factor")), .Names = "URLS", row.names = c(NA,
5L), class = "data.frame")我想要创建一个包含所有唯一parameters参数的向量URL。例如,在这个dataframe中,输出应该是:
parameters <- c("channel","ic","productType","fileId")我的真实数据框架有10000多个观测数据。因此,手动操作并不是可行的选择。
发布于 2016-06-16 17:51:47
你可以试试urltools包,
library(urltools)
url_parse(URLS$URLS)为了得到参数,
url_parse(URLS$URLS)$parameter
#[1] "productType=5546d46253f65057015414dc7d576130&ic=0590001"
#[2] "channel=db3a30433580b37101359f8ee6963814"
#[3] "ic=0590001"
#[4] "fileId=5546d46253f65057015452d6317a71df&ic=0590001"
#[5] "channel=db3a3043397219b6013977d62aa15462&ic=0590001" 或
pars <- parameters(URLS$URLS)
unique(sub('=.*', '', pars))
#[1] "productType" "channel" "ic" "fileId"发布于 2016-06-16 17:58:46
您可以使用来自getFormParams()包的RCurl来获取命名的参数值。那我们就把名字取下来。
params <- lapply(URLS$URLS, function(x) names(RCurl::getFormParams(x)))
unique(unlist(params))
# [1] "productType" "ic" "channel" "fileId" https://stackoverflow.com/questions/37866066
复制相似问题