首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >HTML抓取HTML表

HTML抓取HTML表
EN

Stack Overflow用户
提问于 2022-08-02 07:15:40
回答 1查看 71关注 0票数 1

我目前正在用VBA构建一个宏,以便自动获得地球上某个点的te磁偏角。我设法自动化了填写正确坐标的过程,并确保输出的格式是正确的。首先,我想把输出作为CSV文件。自动下载该文件,然后减去正确的号码的过程是有效的。但是,我知道想要使用HTML选项作为输出,因为这样可以避免下载文件,从而使进程(在我看来)更快。我现在遇到的问题是,与从HTML弹出中减去磁变化有关.当我查看网页的源代码时,我不清楚如何正确地获得正确的编号,因为这个数字没有ID,而且它显示的表与输入表的类名相同.

我想我需要,不知怎么的,循环的输出表的元素,并以这种方式获得磁变化。

磁变率通过:https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml#declination计算。

我试图获得的数字在源代码中有以下位置:

用于自动生成输出的VBA代码片段:

代码语言:javascript
复制
Sub ScrapeWebData()
    Dim ieObj As InternetExplorer
    Dim htmlEle As IHTMLElement
    Dim i As Integer
    
    i = 2
    
    Set IE = CreateObject("InternetExplorer.Application")
    'Set ieObj = New InternetExplorer
    IE.Visible = True
    IE.navigate "https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml"
    
    Application.Wait Now + TimeValue("00:00:05")
    
    Set doc = IE.document
    
    'Import right latitude and longitude into the Magnetic Declination Calculator
    
    'Fill in the latitude
    doc.getElementById("declinationLat1").Value = ThisWorkbook.Sheets("Sheet1").Range("A1").Value
    
    'Choose North or South
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "S" Then
        doc.all.Item("lat1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "N" Then
        doc.all.Item("lat1Hemisphere")(1).Checked = True
    End If
    
    'Fill in the longitude
    doc.getElementById("declinationLon1").Value = ThisWorkbook.Sheets("Sheet1").Range("C1").Value
    
    'Choose West or East
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "W" Then
        doc.all.Item("lon1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "E" Then
        doc.all.Item("lon1Hemisphere")(1).Checked = True
    End If
    
    'Choose format of the output file to be HTML
    doc.all.Item("resultFormat")(0).Checked = True
    
    doc.getElementById("calcbutton").Click

'Here I tried to obtain the right number from the output, but it doesn't work sadly. 
    
    For Each htmlEle In IE.document.getElementById("declinationResultContents")
        For Each htmlnew In IE.document.getElementsByClassName("shadow")(4).getElementsByTagName("tr")
        
        With ActiveSheet
            .Range("A" & i).Value = htmlnew.Children(0).textContent
        End With
        
        i = i + 1
        Next htmlnew
    Next htmlEle
    
End Sub

在这里,我在我的excel工作表中使用以下单元格:在这里输入图像描述

是否有人熟悉HTML-表的web抓取,而所需的数字(和表)没有明确的标识符?你能帮帮我吗?提前谢谢!!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-08-02 08:55:28

事实上,我不想给出更多的答案与IE网络刮刮。但是由于这个页面似乎仍然运行得完美无缺,我看了一下弹出窗口是如何工作的。

但首先有个提示:

每个VBA模块的第一行应该是Option Explicit。这将首先检查启动宏时是否声明了所有变量。这有时是烦人的,但可以避免非常耗时的错误搜索仅仅因为有一个错误。

在宏中有更多需要优化的地方,但是所有这些都是有效的。我还查看了整个过程,因为您对代码的注释方式让我直接理解了您想要做的事情。那真是令人愉快。

删除以下代码块。它不是这样工作的:

代码语言:javascript
复制
'Here I tried to obtain the right number from the output, but it doesn't work sadly.
For Each htmlEle In IE.document.getElementById("declinationResultContents")
    For Each htmlnew In IE.document.getElementsByClassName("shadow")(4).getElementsByTagName("tr")

    With ActiveSheet
        .Range("A" & i).Value = htmlnew.Children(0).textContent
    End With

    i = i + 1
    Next htmlnew
Next htmlEle

用下面的代码替换它。您应该能够使用它来读取您真正需要的数据:

代码语言:javascript
复制
'The needed data stands in the last html table of the document
'This table will build when clicking the button
'So we have to wait until the browser has generated the code
'(It would be better to use a loop here and not Application.Wait
'But I use the quick and dirty way here ;-)
Application.Wait Now + TimeValue("00:00:05")

Set htmlEle = IE.document.getElementsByTagName("table")
MsgBox htmlEle(htmlEle.Length - 1).innertext

编辑

这是整个宏,就像它对我有用一样。我开始实现这个循环,这样就不必使用Application.Wait了。但这会导致页面上的后续问题。这也可能是可以解决的,但工作量相对较大。所以我删除了它以保持代码清晰。

代码语言:javascript
复制
Sub ScrapeWebData()
    Const url As String = "https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml"
    Dim IE As Object
    Dim doc As Object
    Dim htmlEle As Object
    
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    IE.navigate url
    Application.Wait Now + TimeValue("00:00:10")
    Set doc = IE.document
    
    'Import right latitude and longitude into the Magnetic Declination Calculator
    'Fill in the latitude
    doc.getElementById("declinationLat1").Value = ThisWorkbook.Sheets("Sheet1").Range("A1").Value
    
    'Choose North or South
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "S" Then
        doc.all.Item("lat1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "N" Then
        doc.all.Item("lat1Hemisphere")(1).Checked = True
    End If
    
    'Fill in the longitude
    doc.getElementById("declinationLon1").Value = ThisWorkbook.Sheets("Sheet1").Range("C1").Value
    
    'Choose West or East
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "W" Then
        doc.all.Item("lon1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "E" Then
        doc.all.Item("lon1Hemisphere")(1).Checked = True
    End If
    
    'Choose format of the output file to be HTML
    doc.all.Item("resultFormat")(0).Checked = True
    
    doc.getElementById("calcbutton").Click
    
    'The needed data stands in the last html table of the document
    'This table will build when clicking the button
    'So we have to wait until the browser has generated the code
    Application.Wait Now + TimeValue("00:00:05")
    Set htmlEle = IE.document.getElementsByTagName("table")
    'You can write the following result were ever you want
    'Here it's a message box, but you can use also a cell
    'or Debug.Print for testing senses
    MsgBox htmlEle(htmlEle.Length - 1).getElementsByTagName("tbody")(0).getElementsByTagName("td")(1).innertext
End Sub
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73203401

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档