我正在尝试创建一个metro应用程序,其中包含我的大学的科目时间表。我使用HAP+Fizzler解析页面并获取数据。
计划链接给我@太多自动重定向@错误。我发现CookieContainer可以帮助我,但不知道如何实现它。
CookieContainer cc = new CookieContainer();
request.CookieContainer = cc;我的代码:
public static HttpWebRequest request;
public string Url = "http://cist.kture.kharkov.ua/ias/app/tt/f?p=778:201:9421608126858:::201:P201_FIRST_DATE,P201_LAST_DATE,P201_GROUP,P201_POTOK:01.09.2012,31.01.2013,2423447,0:";
public SampleDataSource()
{
HtmlDocument html = new HtmlDocument();
request = (HttpWebRequest)WebRequest.Create(Url);
request.Proxy = null;
request.UseDefaultCredentials = true;
CookieContainer cc = new CookieContainer();
request.CookieContainer = cc;
html.LoadHtml(request.RequestUri.ToString());
var page = html.DocumentNode;
String ITEM_CONTENT = null;
foreach (var item in page.QuerySelectorAll(".MainTT"))
{
ITEM_CONTENT = item.InnerHtml;
}
}使用CookieContainer我没有得到错误,但是由于某种原因,DocumentNode.InnerHtml得到的是URI的值,而不是页面html的值。
发布于 2012-11-29 02:12:21
您只需更改一行。
替换
html.LoadHtml(request.RequestUri.ToString());使用
html.LoadHtml(new StreamReader(request.GetResponse().GetResponseStream()).ReadToEnd());编辑
首先将您的方法标记为async
request.CookieContainer = cc;
var resp = await request.GetResponseAsync();
html.LoadHtml(new StreamReader(resp.GetResponseStream()).ReadToEnd());发布于 2013-06-25 00:24:18
如果您想下载网页代码,请尝试使用此方法(通过使用HttpClient):
public async Task<string> DownloadHtmlCode(string url)
{
HttpClientHandler handler = new HttpClientHandler { UseDefaultCredentials = true, AllowAutoRedirect = true };
HttpClient client = new HttpClient(handler);
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
string responseBody = await response.Content.ReadAsStringAsync();
return responseBody;
}发布于 2013-06-25 01:07:20
如果你想解析你下载的htmlcode,你可以使用Regex或LINQ。我有一些使用LINQ解析html代码的例子,但在此之前,您应该使用HtmlAgilityPack库将代码加载到HtmlDocument中。然后你可以这样加载:html.LoadHtml(temphtml);当你这样做的时候,你可以解析你的HtmlDocument:
//This is for img links parse-example:
IEnumerable<HtmlNode> imghrefNodes = html.DocumentNode.Descendants().Where(n => n.Name == "img");
foreach (HtmlNode img in imghrefNodes)
{
HtmlAttribute att = img.Attributes["src"];
//in att.Value you can find your img url
//Here you can do everything what you want with all img links by editing att.Value
}https://stackoverflow.com/questions/13611606
复制相似问题