首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >HtmlAgilityPack XPath返回HtmlAgilityPack.HtmlNodeCollection

HtmlAgilityPack XPath返回HtmlAgilityPack.HtmlNodeCollection
EN

Stack Overflow用户
提问于 2016-01-23 00:15:29
回答 2查看 628关注 0票数 0

我正试着刮一个网站来获取它的数据。到目前为止,我至少可以连接到网站,但是现在,当我尝试用数据设置文本框的文本时,我只得到了一堆:

代码语言:javascript
复制
HtmlAgilityPack.HtmlNodeCollection

HtmlAgilityPack.HtmlNodeCollection的数量与有数据的数量相同。这是我的代码(我知道它有点草率):

代码语言:javascript
复制
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;

namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
    string choice;

    public Form1()
    {
        InitializeComponent();
    }

    public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
    {

    }

    public void button1_Click(object sender, System.EventArgs e)
    {
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.OptionFixNestedTags = true;

        string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
        HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
        request.Method = "GET";

        Console.WriteLine(request.RequestUri.AbsoluteUri);
        WebResponse response = request.GetResponse();

        htmlDoc.Load(response.GetResponseStream(), true);
        if (htmlDoc.DocumentNode != null)
        {
            var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");



           if (articleNodes != null && articleNodes.Any())
            {
                foreach (var articleNode in articleNodes)
                {

                    textBox1.AppendText(htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p").ToString());

                }
            }
        }

        Console.ReadLine();  
    }

    private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
    {
        choice = listBox1.SelectedItem.ToString();
    }



}
}

我错过了什么/做错了什么?数据应该返回如下内容:

代码语言:javascript
复制
Warren County Public Schools Closed 
Washington Adventist University Closing at Noon

谢谢你看这个。

EN

回答 2

Stack Overflow用户

发布于 2016-01-23 00:33:07

没人发现这个问题。我想我是想抓住文档节点而不是内部文本.这是代码,以防有人想要。

代码语言:javascript
复制
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;

namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
    string choice;

    public Form1()
    {
        InitializeComponent();
    }

    public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
    {

    }

    public void button1_Click(object sender, System.EventArgs e)
    {
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.OptionFixNestedTags = true;

        string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
        HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
        request.Method = "GET";

        Console.WriteLine(request.RequestUri.AbsoluteUri);
        WebResponse response = request.GetResponse();

        htmlDoc.Load(response.GetResponseStream(), true);
        if (htmlDoc.DocumentNode != null)
        {
            var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");



           if (articleNodes != null && articleNodes.Any())
            {
                int k = 0;
                foreach (var articleNode in articleNodes)
                {


                    textBox1.AppendText(articleNode.InnerText + "\n");

                }
            }
        }

        Console.ReadLine();  
    }

    private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
    {
        choice = listBox1.SelectedItem.ToString();
    }



}

}

票数 0
EN

Stack Overflow用户

发布于 2016-01-23 00:33:44

因为articleNodes已经包含了您感兴趣的节点,所以不需要在循环中再次调用SelectNodes()

另外,您不需要检查null,因为articleNodes是一个集合。它可能是空的,但不应该是null

试一试,访问InnerHtml (或InnerText)属性:

代码语言:javascript
复制
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");

var result = articleNodes.Select(x => x.InnerHtml.Replace("<br><span>", " ")
                                                 .Replace(" </span>", "")).ToList();
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34958093

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档