文章/答案/技术大牛

发布

社区首页 >问答首页 >使用HttpWebRequests提取C#名称

问使用HttpWebRequests提取C#名称
EN

Stack Overflow用户

提问于 2012-11-11 07:20:45

回答 3查看 176关注 0票数 0

我是动漫迷，我想得到所有动漫人物的完整名单，所以我遇到了这个网站：http://www.animevice.com/characters/?page=1我的目标是提取所有的名字，并将它们添加到listBox1中。下面是我当前的代码：

        try
        {
        while (true)
        {
            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.animevice.com/characters/?page=" + n);
            req.Method = "GET";
            req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0";
            req.KeepAlive = true;

            HttpWebResponse response = (HttpWebResponse)req.GetResponse();
            Stream responseData = response.GetResponseStream();
            StreamReader reader = new StreamReader(responseData);
            string responseFromServer = reader.ReadToEnd();
            string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
            Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
            if (match.Success)
            {
                listBox1.Items.Add(match.Groups[2]Value.ToString());

            }
            if (listBox1.Items.Count % 50 == 0)
            {
                n++;
            }
        }
}
catch { }

然而，这给了我很多次列表中的第一个名字( Monkey D.Luffy)。有什么解决方案吗？干杯

extract

names

web

response

回答 3

Stack Overflow用户

发布于 2012-11-11 07:36:11

我会使用像HtmlAgilityPack这样的真正的html解析器来解析html，而不是正则表达式。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(responseFromServer);
var names = doc.DocumentNode.SelectNodes("//a[@class='name']")
                .Select(a=>a.InnerText)
                .ToList();

listBox1.DataSource = names;

票数 1

Stack Overflow用户

发布于 2012-11-11 07:30:04

您仅读取了页面的一个名称。

而是：

Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
if (match.Success)
{
    listBox1.Items.Add(match.Groups[2]Value.ToString());

}
if (listBox1.Items.Count % 50 == 0)
{
    n++;
}

试试这个：

var matches = Regex.Matches(responseFromServer, m, RegexOptions.IgnoreCase);
foreach (var item in matches)
{
    var match = item as Match;
    if (match.Success)
    {
        listBox1.Items.Add(match.Groups[2]Value.ToString());    
    }
    if (list.Count % 50 == 0)
    {
        n++;
    }
}

票数 0

Stack Overflow用户

发布于 2012-11-11 07:32:30

using (StreamReader reader = new StreamReader(responseData))
  {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
             string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
             Match match = Regex.Match(line, m, RegexOptions.IgnoreCase);
             if (match.Success)
             {
                 listBox1.Items.Add(match.Groups[2].Value.ToString());
             }
         }
  }

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13327059

复制

相似问题

问使用HttpWebRequests提取C#名称
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用HttpWebRequests提取C#名称EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用HttpWebRequests提取C#名称
EN