我正试着刮一个网站来获取它的数据。到目前为止,我至少可以连接到网站,但是现在,当我尝试用数据设置文本框的文本时,我只得到了一堆:
HtmlAgilityPack.HtmlNodeCollectionHtmlAgilityPack.HtmlNodeCollection的数量与有数据的数量相同。这是我的代码(我知道它有点草率):
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
string choice;
public Form1()
{
InitializeComponent();
}
public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
}
public void button1_Click(object sender, System.EventArgs e)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
request.Method = "GET";
Console.WriteLine(request.RequestUri.AbsoluteUri);
WebResponse response = request.GetResponse();
htmlDoc.Load(response.GetResponseStream(), true);
if (htmlDoc.DocumentNode != null)
{
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
if (articleNodes != null && articleNodes.Any())
{
foreach (var articleNode in articleNodes)
{
textBox1.AppendText(htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p").ToString());
}
}
}
Console.ReadLine();
}
private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
choice = listBox1.SelectedItem.ToString();
}
}
}我错过了什么/做错了什么?数据应该返回如下内容:
Warren County Public Schools Closed
Washington Adventist University Closing at Noon谢谢你看这个。
发布于 2016-01-23 00:33:07
没人发现这个问题。我想我是想抓住文档节点而不是内部文本.这是代码,以防有人想要。
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
string choice;
public Form1()
{
InitializeComponent();
}
public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
}
public void button1_Click(object sender, System.EventArgs e)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
request.Method = "GET";
Console.WriteLine(request.RequestUri.AbsoluteUri);
WebResponse response = request.GetResponse();
htmlDoc.Load(response.GetResponseStream(), true);
if (htmlDoc.DocumentNode != null)
{
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
if (articleNodes != null && articleNodes.Any())
{
int k = 0;
foreach (var articleNode in articleNodes)
{
textBox1.AppendText(articleNode.InnerText + "\n");
}
}
}
Console.ReadLine();
}
private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
choice = listBox1.SelectedItem.ToString();
}
}}
发布于 2016-01-23 00:33:44
因为articleNodes已经包含了您感兴趣的节点,所以不需要在循环中再次调用SelectNodes()。
另外,您不需要检查null,因为articleNodes是一个集合。它可能是空的,但不应该是null。
试一试,访问InnerHtml (或InnerText)属性:
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
var result = articleNodes.Select(x => x.InnerHtml.Replace("<br><span>", " ")
.Replace(" </span>", "")).ToList();https://stackoverflow.com/questions/34958093
复制相似问题