首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何通过类属性解析重复的HTML元素?

如何通过类属性解析重复的HTML元素?
EN

Stack Overflow用户
提问于 2022-02-05 23:26:01
回答 1查看 74关注 0票数 -1

我试图用基本相同的标记来解析HTML文件。

我想得到这个输出:

BTC -比特币,BEP20,比特币

ERC20,BEP20,多边形,ARBITRUM,AURORA,MATISEVM

USDT,TRC20,ERC20,BEP20(BSC),HECO,POLYGON,FTM,AVAX-C,ARBITRUM,METISEVM

卡什- ERC20

下面是HTML的一个示例:

代码语言:javascript
复制
<div data-v-326d86f4="" class="table-box">
   <table data-v-326d86f4="">
      <tr data-v-326d86f4="">
         <td data-v-326d86f4="">BTC</td>
         <td data-v-326d86f4="" class="block-chain">
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">Bitcoin</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">Bitcoin</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">Bitcoin(SegWit)</span></div>
         </td>
         <td data-v-326d86f4="">0.001</td>
         <td data-v-326d86f4="">0.002</td>
      </tr>
      <tr data-v-326d86f4="">
         <td data-v-326d86f4="">ETH</td>
         <td data-v-326d86f4="" class="block-chain">
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">ERC20</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">AURORA</span><span data-v-326d86f4="">METISEVM</span></div>
         </td>
         <td data-v-326d86f4="">0.012</td>
         <td data-v-326d86f4="">0.024</td>
      </tr>
      <tr data-v-326d86f4="">
         <td data-v-326d86f4="">USDT</td>
         <td data-v-326d86f4="" class="block-chain">
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">OMNI</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">OMNI</span><span data-v-326d86f4="">TRC20</span><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">HECO</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">FTM</span><span data-v-326d86f4="">AVAX-C</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">METISEVM</span></div>
         </td>
         <td data-v-326d86f4="">30</td>
         <td data-v-326d86f4="">50</td>
      </tr>
      <tr data-v-326d86f4="">
         <td data-v-326d86f4="">QASH</td>
         <td data-v-326d86f4="" class="block-chain">
            <div data-v-326d86f4="" class="chain_box">
               <span data-v-326d86f4="" class="chain_name">ERC20</span> <!---->
            </div>
            <!---->
         </td>
         <td data-v-326d86f4="">513</td>
         <td data-v-326d86f4="">1026</td>
      </tr>
      <!-- ... -->

我正在使用HtmlAgilityPack库,但没有成功:

代码语言:javascript
复制
Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html"
Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()
myHtml.Load(arqHtml)
Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table")
Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr")
For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows
    Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td")
    If myCells IsNot Nothing Then
        Dim myToken As String = myCells(0).InnerText
        Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")
        If mySpans IsNot Nothing Then
            Dim myListBChain As New List(Of String)
            For Each mySpan As HtmlAgilityPack.HtmlNode In mySpans
                RichTextBox1.Text += mySpan.InnerText
            Next
            Dim allItensAsString = String.Join(", ", richtextbox1.text)
        End If
    End If
Next

这将返回以下输出:

BitcoinBEP20(BSC)Bitcoin(SegWit)ERC20BEP20(BSC)POLYGONARBITRUMAURORAMETISEVMOMNITRC20ERC20BEP20(BSC)HECOPOLYGONFTMAVAX-CARBITRUMMETISEVMEOSBEP20(BSC)ERC20BEP20(BSC)TRC20BEP20(BSC)ZILBEP20(BSC)NEOLEGACYNEON3ERC20POLYGONERC20DAGBEP2BEP20(BSC)FTMAVAX-CERC20BEP20(BSC)ERC20BEP20(BSC)ERC20HECOBEP20(BSC)ERC20HECOERC20POLYGONERC20HECOERC20POLYGONERC20BEP20(BSC)BCHBEP20(BSC)ERC20LOOPPOLYGONBEP20(BSC)FTMAVAX-CMETISEVMERC20TOLERC20METAERC20BEP20(BSC)

如何使它返回我想要的输出?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-06 09:51:33

结合my comment on the original issue,在样本中的最后一个<tr>中.

代码语言:javascript
复制
<tr data-v-326d86f4="">
    <td data-v-326d86f4="">QASH</td>
    <td data-v-326d86f4="" class="block-chain">
    <div data-v-326d86f4="" class="chain_box">
        <span data-v-326d86f4="" class="chain_name">ERC20</span> <!---->
    </div>
    <!---->
    </td>
    <td data-v-326d86f4="">513</td>
    <td data-v-326d86f4="">1026</td>
</tr>

...the第二<td>不包含<div class="select-list" ... >,所以.

代码语言:javascript
复制
myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")

...returns Nothing,也就是NullReferenceException

就构建所需的输出而言,首先需要测试这样的<div class="select-list" ... >元素是否存在.

代码语言:javascript
复制
If mySpans Is Nothing Then

如果没有,那么保存<div class="chain_box" ... ><span class="chain_name ... >元素的内容..。

代码语言:javascript
复制
Dim chainTextNode As HtmlAgilityPack.HtmlNode = myCells(1).SelectSingleNode(
    "div[contains(@class, 'chain_box')]/span[contains(@class, 'chain_name')]"
)

chainText = If(chainTextNode Is Nothing OrElse String.IsNullOrWhiteSpace(chainTextNode.InnerText), "(unknown)", chainTextNode.InnerText)

我增加了一些额外的处理,以防该元素不存在或没有值。

如果有一个<div class="select-list" ... >元素,那么保存它的子<span ... >元素的值,用逗号分隔.

代码语言:javascript
复制
chainText = String.Join(", ", mySpans.Select(Function(span) span.InnerText))
' Alternative: chainText = String.Join(", ", From span In mySpans Select span.InnerText)

最后,构建并在文本框中添加一个新行.

代码语言:javascript
复制
RichTextBox1.Text &= $"{myToken} - {chainText}{Environment.NewLine}"

完整的代码看起来像这样..。

代码语言:javascript
复制
Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html"
Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()
myHtml.Load(arqHtml)
Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table")

Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr")
For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows
    Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td")
    If myCells IsNot Nothing Then
        Dim myToken As String = myCells(0).InnerText
        Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")
        Dim chainText As String

        If mySpans Is Nothing Then
            Dim chainTextNode As HtmlAgilityPack.HtmlNode = myCells(1).SelectSingleNode(
                "div[contains(@class, 'chain_box')]/span[contains(@class, 'chain_name')]"
            )

            chainText = If(chainTextNode Is Nothing OrElse String.IsNullOrWhiteSpace(chainTextNode.InnerText), "(unknown)", chainTextNode.InnerText)
        Else
            chainText = String.Join(", ", mySpans.Select(Function(span) span.InnerText))
            ' Alternative: chainText = String.Join(", ", From span In mySpans Select span.InnerText)
        End If

        RichTextBox1.Text &= $"{myToken} - {chainText}{Environment.NewLine}"
    End If
Next

如果您有一个非常大的输入HTML文件,您可能会考虑.

  • 将每个迭代的行附加到StringBuilder.outputBuilder.Append($"{myToken} - {chainText}{Environment.NewLine}")

...and然后在循环之后设置RichTextBox1.Text一次..。RichTextBox1.Text = outputBuilder.ToString()

之后调用RichTextBox1.ResumeLayout()

但是,...to可以提高性能,但是,使用这两种方法都意味着RichTextBox1在完全处理HTML之前不会显示任何输出。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71002994

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档