我在编写VBA代码时遇到了一些问题,无法从站点上抓取div类数据并放入excel中。出于隐私考虑(患者数据),我无法发布URL,但代码如下:
<div id="location-1" class="Location">
<div class="grid">
<div class="row">
<div class="info">
<div class="column">
<div class="element-1">[text]</div>
<div class="element-2">[text]</div>
<p class="element-3"></p>
<p class="element-4">[text]</p>
<p class="element-5"></p>
<p class="element-6">[text]</p>
<div class="dir">
<a href="[link]" class="dir" target="_blank">Get dir</a>
</div>
</div>
</div>
</div>
</div>
</div>我的代码发布在下面。我正在尝试从“元素-1”和“元素-2”中抓取信息到每个源的一行中。这里的任何帮助都将不胜感激!
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim source As Object
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName("column")
x = x + 1: Cells(x, 1) = source.getAttribute("element-1")
Cells(x, 2) = source.getAttribute("element-2")
Next source
End Sub发布于 2020-11-24 21:25:40
这里有两种不同的解决方案。(未测试)
第一个:
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim nodeColumnElements As Object
Dim currentRow As Long
currentRow = x 'Here your start row
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
Set nodeColumnElements = html.getElementsByClassName("column")(0).getElementsByTagName("div")
Cells(currentRow, 1) = Trim(nodeColumnElements(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(nodeColumnElements(1).innertext)
End Sub第二个直接获取这两个元素:
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim currentRow As Long
currentRow = x 'Here your start row
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
Cells(currentRow, 1) = Trim(html.getElementsByClassName("element-1")(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(html.getElementsByClassName("element-2")(0).innertext)
End Subhttps://stackoverflow.com/questions/64980493
复制相似问题