首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用VBA从站点中抓取innerHTML

使用VBA从站点中抓取innerHTML
EN

Stack Overflow用户
提问于 2017-05-29 09:20:36
回答 1查看 1.6K关注 0票数 2

我试图声明一个节点数组(这不是问题),然后在数组的每个元素中刮取两个子节点的innerHTML --以SE为例(使用IE对象方法),假设我试图在主页上刮取标题和问题,有一个节点数组(类名:“问题-摘要”)。

然后有两个子节点(平铺类名称:“问题-超链接”和提取类名称:“摘录”),我使用的代码如下:

代码语言:javascript
复制
Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String

Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"

QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"

With ie
    .Visible = False
    .Navigate sURL
End With

Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4)

Set oQuestionShells = doc.getElementsByClassName(QuestionShell)

For Each oElement In oQuestionShells
    Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method"
    Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method"
    Debug.Print oQuestionTitle.innerHTML
    Debug.Print oQuestion.innerHTML
Next

End Sub
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-05-29 09:41:09

getElementByClassName不是一种方法。

您只能使用getElementsByClassName (注意方法名称中的复数),它返回一个IHTMLElementCollection

使用Object代替IHTMLElementCollection是可以的,但是您仍然必须通过提供索引来访问集合中的特定元素。

让我们假设,对于每个oElement,只有一个类question-summary的实例和一个类question-hyperlink的实例。然后只需使用getElementsByClassName并在末尾使用(0)提取返回的数组的第一个元素。

所以你的代码更正是:

代码语言:javascript
复制
Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)

完整的工作代码(通过一些更新,即使用Option Explicit并等待页面加载):

代码语言:javascript
复制
Option Explicit

Sub Scraper()

    Dim ie As Object
    Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
    Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String

    Set ie = CreateObject("internetexplorer.application")
    sURL = "https://stackoverflow.com/questions/tagged/excel-formula"

    QuestionShell = "question-summary"
    QuestionTitle = "question-hyperlink"
    Question = "excerpt"

    With ie
        .Visible = True
        .Navigate sURL
        Do
            DoEvents
        Loop While .ReadyState < 4 Or .Busy
    End With

    Set doc = ie.Document

    Set oQuestionShells = doc.getElementsByClassName(QuestionShell)

    For Each oElement In oQuestionShells
        'Debug.Print TypeName(oElement)

        Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
        Set oQuestion = oElement.getElementsByClassName(Question)(0)

        Debug.Print oQuestionTitle.innerHTML
        Debug.Print oQuestion.innerHTML
    Next

    ie.Quit

End Sub
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44238865

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档