如何从html文档中解析带有URL的rel=“规范”标记?
我想在这里找到url:
<link rel="canonical" href="http://stackoverflow.com/questions/2593147/html-agility-pack-make-code-look-neat" />发布于 2012-11-19 20:44:52
假设doc是您的HtmlDocument对象。
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//link[@rel]");应该会得到具有rel属性的link元素。现在迭代:
foreach (HtmlNode link in links)
{
string url;
if (link.Attributes["rel"] == "canonical") {
url = link.Attributes["href"];
}
}此外,还可以过滤SelectNodes调用中的链接,以仅获取带有"canonical":doc.DocumentNode.SelectNodes("//link[@rel='canonical']");的链接
没有经过测试的代码,但您已经明白了:)
发布于 2016-02-08 02:49:11
接受的答案不再正确,更新后的代码如下:
var links = htmlDoc.DocumentNode.SelectNodes("//link[@rel]");
string canonical;
foreach (HtmlNode link in links)
{
if (link.Attributes["rel"].Value == "canonical")
{
canonical = link.Attributes["href"].Value;
}
}发布于 2012-11-19 20:45:40
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(_html);
String link = (from x in doc.DocumentNode.Descendants()
where x.Name == "link"
&& x.Attributes["rel"] != null
&& x.Attributes["rel"].Value == "canonical"
&& x.Attributes["href"] != null
select x.Attributes["href"].Value).FirstOrDefault();https://stackoverflow.com/questions/13453980
复制相似问题