首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用更改的类名进行抓取

用更改的类名进行抓取
EN

Stack Overflow用户
提问于 2021-03-27 18:24:12
回答 1查看 88关注 0票数 1

我试图提取姓名,地址,角色,地位,任命,辞职(如果有的话)从一个网页,下面的代码示例。

问题是,每个公司的董事人数可能是不同的,我不知道如何确定董事总数(类appointment-1) =x,这样我就可以遍历它们。

HTLM代码:

代码语言:javascript
复制
<div class="appointments-list">
    <div class="appointment-1">
        <h2 class="heading-medium">
            <span id="officer-name-1">
                <a href="/officers/Oo16GI3lS3HEgrIR-kCpmLYbDWw/appointments"  onclick="javascript:_paq.push(['trackGoal', 5]);">BUCKSEY, Nicholas</a>
            </span>
        </h2>
        <dl>
            <dt id="officer-address-field-1">Correspondence address</dt>
            <dd class="data" id="officer-address-value-1">
1 St James&#39;s Square, London, SW1Y 4PD                    </dd>
        </dl>
        <div class="grid-row">
            <dl class="column-quarter">
                <dt>Role
                       <span id="officer-status-tag-1" class="status-tag font-xsmall">Active</span>
                </dt>
                <dd id="officer-role-1" class="data">
                    Secretary
                </dd>
            </dl>
            <dl class="column-quarter">
                <dt>Appointed on</dt>
                <dd id="officer-appointed-on-1" class="data">
                    1 June 2020
                </dd>
            </dl>
        </div> 
        <div class="grid-row"></div> 
        <div class="grid-row"></div> 
        <div class="grid-row"></div> 
    </div>
    <div class="appointment-2">
        <h2 class="heading-medium heading-with-border">
            <span id="officer-name-2">
                <a href="/officers/IND_i3_G7Gqq3ZzC3P0rXYbUcNU/appointments"  onclick="javascript:_paq.push(['trackGoal', 5]);">MATHEWS, Benedict John Spurway</a>
            </span>
        </h2>
    </h2>

    <dl>
        <dt id="officer-address-field-2">Correspondence address</dt>
        <dd class="data" id="officer-address-value-2">
1 St James&#39;s Square, London, SW1Y 4PD                    </dd>
    </dl>

    <div class="grid-row">
        <dl class="column-quarter">
            <dt>Role
                   <span id="officer-status-tag-2" class="status-tag font-xsmall">Active</span>
            </dt>
            <dd id="officer-role-2" class="data">
                Secretary
            </dd>
        </dl>
        <dl class="column-quarter">
            <dt>Appointed on</dt>
            <dd id="officer-appointed-on-2" class="data">
                7 May 2019
            </dd>
        </dl>
    </div> 

    <div class="grid-row"></div> 
    <div class="grid-row"></div> 
    <div class="grid-row"></div> 
</div>

VBA代码:我试图使用querySelectorall,但无法“识别”正确的类id。

代码语言:javascript
复制
Sub ChangeTab()
    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate "https://find-and-update.company-information.service.gov.uk/company/00102498/officers"

    Do While ie.readyState <> 4: DoEvents: Loop
    
    'Application.Wait (Now + TimeValue("0:00:02"))
    ' Dim i As Long, secNumberNodeList As Object, secNumberNode As Object
 
    Set secNumberNodeList = ie.Document.querySelectorAll("appointments-list")
 
    For Each sc In secNumberNodeList 
        Debug.Print sc.getElementById("officer-name-1")
        Debug.Print sc.getElementById("officer-address-value-1")
        Debug.Print sc.getElementById("officer-status-tag-1")
        Debug.Print sc.getElementById("officer-appointed-on-1")
        Debug.Print sc.getElementById("officer-appointed-on-1")
        Debug.Print sc.getElementById("officer-resigned-on-16")
    Next
End Sub
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-27 20:20:27

这是一个健壮的方法,你可以这样做。我用的是XMLHttpRequest而不是IE。我试图向您展示如何使用循环访问所有容器的内容。尝试在循环中定义您感兴趣的其他字段来解析它们。

代码语言:javascript
复制
Option Explicit
Sub GetInformation()
    Const URL = "https://find-and-update.company-information.service.gov.uk/company/00102498/officers"
    Dim Http As Object, Html As HTMLDocument, I&
    Dim HtmlDoc As HTMLDocument, sName$, sAddress$

    Set Html = New HTMLDocument
    Set HtmlDoc = New HTMLDocument
    Set Http = CreateObject("MSXML2.XMLHTTP")

    With Http
        .Open "GET", URL, False
        .send
        Html.body.innerHTML = .responseText
    End With

    With Html.querySelectorAll(".appointments-list > [class^='appointment-']")
        For I = 0 To .Length - 1
            HtmlDoc.body.innerHTML = .Item(I).outerHTML
            sName = HtmlDoc.querySelector("h2 > span > a").innerText
            sAddress = HtmlDoc.querySelector(".data[id^='officer-address-value-']").innerText
            Debug.Print sName, sAddress
        Next I
    End With
End Sub

执行上述脚本所需添加的引用:

代码语言:javascript
复制
1. Microsoft XML, v6.0
2. Microsoft HTML Object Library
票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66834718

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档