首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >VBA抓取-将HTMLdoc转换为XML,但单击按钮时会遇到错误。

VBA抓取-将HTMLdoc转换为XML,但单击按钮时会遇到错误。
EN

Stack Overflow用户
提问于 2021-09-01 10:59:26
回答 1查看 85关注 0票数 1

我有一个MSHTML.HTMLDocument代码:

  1. 打开页面"https://www.ksestocks.com/HistoryHighLow"
  2. 填充输入,即786
  3. 然后单击一个按钮来获取表。
  4. 在这里,我使用以下代码捕获一行及其4个子行 SubKSE_GetHTMLDocument() Dim IE As New SHDocVw.InternetExplorer Dim HTMLDOC As MSHTML.HTMLDocument Dim HTMLInput As MSHTML.IHTMLElement Dim HTMLClasses As MSHTML.IHTMLElementCollection Dim HTMLClass As MSHTML.IHTMLElement Dim HTMLCel As MSHTML.IHTMLElement Dim colNum,rowNum,RowN,C As Integer Dim Cel As Range IE.Visible = False IE.Navigate "https://www.ksestocks.com/HistoryHighLow“,而IE.Visible IE.Visible循环用于每张Cel ()(”A3:a“&Cells(,(1) HTMLDOC.getElementsByTagName("input")(0).Click (.End(XlUp).Row)如果IsEmpty( Cel.Value ) = False,则设置HTMLDOC = IE.Document Set HTMLInput = HTMLDOC.getElementById("selscrip") HTMLInput.Value = Trim(Cel.Value) Debug.Print Cel.Value HTMLInput.Value,IE.Busy或IE.readyState < 4: DoEvents: Wend C=0,对于HTMLClass (“tr”),如果.End(,“最后3年(") >0如果离开(HTMLClass.innerText,14) =”最后3年“(如果C=1那么Cel.Offset(0 ),则对于HTMLClass.Children Debug.Print HTMLCel.innerText中的每个HTMLCel7).Value = HTMLCel.innerText ElseIf C=2然后Cel.Offset(0,8).Value = HTMLCel.innerText ElseIf C=3然后Cel.Offset(0,9).Value = HTMLCel.innerText ElseIf C=4然后Cel.Offset(0,10).Value = HTMLCel.innerText End如果C=C+1,下一端,如果是下一端,则是下一端,如果是下一端

上面的代码可以很好地从网站中获取值,但是当我将代码转换为XML时,它也停止了工作,而internet每次使用新窗口都没有结果。

我在哪里做错了?

有没有更强大的方式刮网页?

请在运行之前检查以下代码

代码语言:javascript
复制
Sub KSE_Get_XML()
    
    Dim XMLp As New MSXML2.XMLHTTP60
    Dim HTMLDOC As New MSHTML.HTMLDocument
    
    Dim HTMLInput As MSHTML.IHTMLElement
    
    Dim HTMLClasses As MSHTML.IHTMLElementCollection
    Dim HTMLClass As MSHTML.IHTMLElement
    
    Dim HTMLCel As MSHTML.IHTMLElement
    
    Dim colNum, rowNum, RowN, C As Integer
    
    XMLp.Open "GET", "https://www.ksestocks.com/HistoryHighLow", False
    XMLp.send
    
    HTMLDOC.body.innerHTML = XMLp.responseText
    
    Dim Cel As Range
    
 '   Do While HTMLDOC.ReadyState <> READYSTATE_COMPLETE
  '  Loop
    
    For Each Cel In Sheets("Sheet1").Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row)
    If IsEmpty(Cel.Value) = False Then
        
        HTMLDOC.body.innerHTML = XMLp.responseText
        Set HTMLInput = HTMLDOC.getElementById("selscrip")

        HTMLInput.Value = Trim(Cel.Value)
        Debug.Print Cel.Value
        HTMLDOC.getElementsByTagName("input")(0).Click
        
        'Application.Wait Now + TimeValue("00:00:01")
       '' Do While HTMLDOC.ReadyState <> READYSTATE_COMPLETE
       '     DoEvents
      '  Loop

        C = 0
        For Each HTMLClass In HTMLDOC.getElementsByTagName("tr")
            If InStr(HTMLClass.innerText, "Last 3 years (") > 0 Then
                If Left(HTMLClass.innerText, 14) = "Last 3 years (" Then
                        For Each HTMLCel In HTMLClass.Children
                            Debug.Print HTMLCel.innerText
                            If C = 1 Then
                            Cel.Offset(0, 7).Value = HTMLCel.innerText
                            ElseIf C = 2 Then
                            Cel.Offset(0, 8).Value = HTMLCel.innerText
                            ElseIf C = 3 Then
                            Cel.Offset(0, 9).Value = HTMLCel.innerText
                            ElseIf C = 4 Then
                            Cel.Offset(0, 10).Value = HTMLCel.innerText
                            End If
                            C = C + 1
                        Next
                End If
            End If
            
            
        Next
   End If
   Next

End Sub
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-09-01 19:26:15

完全摆脱IE,切换到xmlhttp请求,这是健壮和较不容易出错的。当您选择xhr时,您需要发出带有适当参数的post http请求。这是您可以做的事情,以获得结果旁边的Last 3 years (1 Sep 2018 - 1 Sep 2021)从那个表。

代码语言:javascript
复制
Public Sub GetContent()
    Const Url = "https://www.ksestocks.com/HistoryHighLow"
    Dim Http As Object, Html As HTMLDocument, Htmldoc As HTMLDocument
    Dim params$, I&, R&, ws As Worksheet, searchKeyword$
    
    Set Html = New HTMLDocument
    Set Htmldoc = New HTMLDocument
    Set Http = CreateObject("MSXML2.XMLHTTP")
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    R = 2
    
    searchKeyword = "786"   'you can use different search keywords here to get related results
    
    params = "selscrip=" & searchKeyword
    
    With Http
        .Open "POST", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
        .setRequestHeader "content-type", "application/x-www-form-urlencoded; charset=UTF-8"
        .send (params)
        Html.body.innerHTML = .responseText
    End With

    With Html.querySelectorAll("td.plain")
        For I = 0 To .Length - 1
            If InStr(.item(I).innerText, "Last 3 years") > 0 Then
                Htmldoc.body.innerHTML = "<table>" & .item(I).ParentNode.outerHTML & "</table>"
                ws.Cells(R, 1) = Htmldoc.querySelectorAll("td.plain")(1).innerText
                ws.Cells(R, 2) = Htmldoc.querySelectorAll("td.plain")(2).innerText
                ws.Cells(R, 3) = Htmldoc.querySelectorAll("td.plain")(3).innerText
                ws.Cells(R, 4) = Htmldoc.querySelectorAll("td.plain")(4).innerText
            End If
        Next I
    End With
End Sub

提及添加:

代码语言:javascript
复制
1. Microsoft XML, v6.0
2. Microsoft HTML Object Library

您的搜索关键字将是您在这幅图像中看到的。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69012492

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档