我必须为一个学校项目获得~1000个网站的源码。我在for循环中使用HTTP Webrequest。但我的列表中超过一半的网站返回了404错误,因此无法找到该网站。当我用Chrome、Firefox或Internet Explorer浏览这个网站时,一切都很正常。
下面是我获取源码的代码:
public string getSource(string url){
string urlAddress = url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
return data;
}也许它不工作是因为1000个网站的海量?
发布于 2014-11-24 22:19:31
对于许多站点,您可能必须将用户代理设置为已知的浏览器,因为它们将拒绝来自未知“浏览器”的请求。在调用request.GetResponse之前先试一试
var agent = "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)";
request.Headers.Add("user-agent", agent);https://stackoverflow.com/questions/27106845
复制相似问题