我试图提取姓名,地址,角色,地位,任命,辞职(如果有的话)从一个网页,下面的代码示例。
问题是,每个公司的董事人数可能是不同的,我不知道如何确定董事总数(类appointment-1) =x,这样我就可以遍历它们。
HTLM代码:
<div class="appointments-list">
<div class="appointment-1">
<h2 class="heading-medium">
<span id="officer-name-1">
<a href="/officers/Oo16GI3lS3HEgrIR-kCpmLYbDWw/appointments" onclick="javascript:_paq.push(['trackGoal', 5]);">BUCKSEY, Nicholas</a>
</span>
</h2>
<dl>
<dt id="officer-address-field-1">Correspondence address</dt>
<dd class="data" id="officer-address-value-1">
1 St James's Square, London, SW1Y 4PD </dd>
</dl>
<div class="grid-row">
<dl class="column-quarter">
<dt>Role
<span id="officer-status-tag-1" class="status-tag font-xsmall">Active</span>
</dt>
<dd id="officer-role-1" class="data">
Secretary
</dd>
</dl>
<dl class="column-quarter">
<dt>Appointed on</dt>
<dd id="officer-appointed-on-1" class="data">
1 June 2020
</dd>
</dl>
</div>
<div class="grid-row"></div>
<div class="grid-row"></div>
<div class="grid-row"></div>
</div>
<div class="appointment-2">
<h2 class="heading-medium heading-with-border">
<span id="officer-name-2">
<a href="/officers/IND_i3_G7Gqq3ZzC3P0rXYbUcNU/appointments" onclick="javascript:_paq.push(['trackGoal', 5]);">MATHEWS, Benedict John Spurway</a>
</span>
</h2>
</h2>
<dl>
<dt id="officer-address-field-2">Correspondence address</dt>
<dd class="data" id="officer-address-value-2">
1 St James's Square, London, SW1Y 4PD </dd>
</dl>
<div class="grid-row">
<dl class="column-quarter">
<dt>Role
<span id="officer-status-tag-2" class="status-tag font-xsmall">Active</span>
</dt>
<dd id="officer-role-2" class="data">
Secretary
</dd>
</dl>
<dl class="column-quarter">
<dt>Appointed on</dt>
<dd id="officer-appointed-on-2" class="data">
7 May 2019
</dd>
</dl>
</div>
<div class="grid-row"></div>
<div class="grid-row"></div>
<div class="grid-row"></div>
</div>VBA代码:我试图使用querySelectorall,但无法“识别”正确的类id。
Sub ChangeTab()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://find-and-update.company-information.service.gov.uk/company/00102498/officers"
Do While ie.readyState <> 4: DoEvents: Loop
'Application.Wait (Now + TimeValue("0:00:02"))
' Dim i As Long, secNumberNodeList As Object, secNumberNode As Object
Set secNumberNodeList = ie.Document.querySelectorAll("appointments-list")
For Each sc In secNumberNodeList
Debug.Print sc.getElementById("officer-name-1")
Debug.Print sc.getElementById("officer-address-value-1")
Debug.Print sc.getElementById("officer-status-tag-1")
Debug.Print sc.getElementById("officer-appointed-on-1")
Debug.Print sc.getElementById("officer-appointed-on-1")
Debug.Print sc.getElementById("officer-resigned-on-16")
Next
End Sub发布于 2021-03-27 20:20:27
这是一个健壮的方法,你可以这样做。我用的是XMLHttpRequest而不是IE。我试图向您展示如何使用循环访问所有容器的内容。尝试在循环中定义您感兴趣的其他字段来解析它们。
Option Explicit
Sub GetInformation()
Const URL = "https://find-and-update.company-information.service.gov.uk/company/00102498/officers"
Dim Http As Object, Html As HTMLDocument, I&
Dim HtmlDoc As HTMLDocument, sName$, sAddress$
Set Html = New HTMLDocument
Set HtmlDoc = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "GET", URL, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll(".appointments-list > [class^='appointment-']")
For I = 0 To .Length - 1
HtmlDoc.body.innerHTML = .Item(I).outerHTML
sName = HtmlDoc.querySelector("h2 > span > a").innerText
sAddress = HtmlDoc.querySelector(".data[id^='officer-address-value-']").innerText
Debug.Print sName, sAddress
Next I
End With
End Sub执行上述脚本所需添加的引用:
1. Microsoft XML, v6.0
2. Microsoft HTML Object Libraryhttps://stackoverflow.com/questions/66834718
复制相似问题