我从这个html表中解析:
<table align="center">
<tbody>
<!-- riadok -->
<tr>
<td valign="middle" align="right">
<form action="130427_0i.htm" method="get">
<input type="submit" class="button" title="uvedení do první modlitby dne" value="Inv.">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_0c.htm" method="get">
<input type="submit" class="button" title="modlitba se čtením" value="Čtení">
</form>
</td>
<td valign="middle" align="left">
<form action="130427_0r.htm" method="get">
<input type="submit" class="button" title="ranní chvály" value="Ranní chvály">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td valign="middle" align="right">
<form action="130427_09.htm" method="get">
<input type="submit" class="button" title="modlitba dopoledne" value="9h">
</form>
<form action="130427_09d.htm" method="get">
<input type="submit" class="button" title="modlitba dopoledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_02.htm" method="get">
<input type="submit" class="button" title="modlitba v poledne" value="12h">
</form>
<form action="130427_02d.htm" method="get">
<input type="submit" class="button" title="modlitba v poledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
<td valign="middle" align="left">
<form action="130427_03.htm" method="get">
<input type="submit" class="button" title="modlitba odpoledne" value="15h">
</form>
<form action="130427_03d.htm" method="get">
<input type="submit" class="button" title="modlitba odpoledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td align="right">
<form action="130427_0v.htm" method="get">
<input type="submit" class="button" title="nešpory" value="Nešpory">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_0k.htm" method="get">
<input type="submit" class="button" title="kompletář" value="Kompl.">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td align="right"></td>
</tr>
</tbody>
</table>我需要在一个HtmlNode中获得每个表单(带有输入)。例如:
<form action="130427_0c.htm" method="get">
<input type="submit" class="button" title="modlitba se čtením" value="Čtení">
</form>使用我的代码,我只能得到以下内容:
<form action="130427_0c.htm" method="get">我的代码:
public static class FromHtmlTableToHtmlNodeList
{
static List<List<HtmlNode>> tableOfNode = new List<List<HtmlNode>>();
public static List<List<HtmlNode>> Do(string htmltable)
{
var doc = new HtmlDocument();
doc.LoadHtml(htmltable);
HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr");
for (int i = 0; i < rows.Count; i++)
{
int i2 = tableOfNode.Count;
HtmlNodeCollection cols = rows[i].SelectNodes("./td");
for (int j = 0; j < cols.Count; j++)
{
HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");
List<HtmlNode> nextRow = new List<HtmlNode>();
if (inCols != null)
{
for (int k = 0; k < inCols.Count; k++)
{
if (tableOfNode.Count < i2+k + 1)
{
tableOfNode.Add(nextRow);
}
if (tableOfNode[i2 + k].Count < j + 1) tableOfNode[i2 + k].Insert(j, inCols[k]);
}
}
}
}
return tableOfNode;
}
}我知道这个问题存在:
HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");对于我想要的东西,XPath应该是什么样的?
发布于 2013-04-27 19:54:17
默认情况下,Html Agility Pack会对该表单进行特殊处理。查看原因:HtmlAgilityPack -- Does close itself for some reason?
这段代码应该会得到所有的表单元素:
HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("form");
doc.Load(myTestHtm);
foreach (var v in doc.DocumentNode.SelectNodes("//form"))
{
Console.WriteLine(v.OuterHtml);
}发布于 2013-04-27 18:02:10
您正在查找XPath表达式
./form[input]这将返回所有<form/>元素,包括至少包含一个<input/>元素的子树。
https://stackoverflow.com/questions/16250551
复制相似问题