首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用VBA查找给定搜索项的图像结果数

如何使用VBA查找给定搜索项的图像结果数
EN

Stack Overflow用户
提问于 2017-02-21 16:42:28
回答 2查看 508关注 0票数 0

我一直在利用Excel中的HTML,试图近似不同分辨率下的常见图像。我希望能得到一些动态的信息--用户输入一个搜索项,代码循环通过一组预定义的图像分辨率,在指定的分辨率之间对该搜索项的常见图像进行排序。

但是,第一步是获得一种可靠(且快速)的方法,以返回特定分辨率下的图像数量。我写了这段代码:

代码语言:javascript
复制
Sub GoogleWithURL() 'requires Microsoft HTML Object Library

    Dim url As String, searchTerm As String
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("sheet1")
    Dim currPage As HTMLDocument
    Dim xRes As Integer, yRes As Integer
    With ws
        xRes = .Range("XRes")
        yRes = .Range("YRes")
        searchTerm = .Range("search")
    End With

    'create URL to page with these image criteria
    url = WorksheetFunction.Concat("https://www.google.com/search?q=", searchTerm, _
                        "&tbm=isch&source=lnt&tbs=isz:ex,iszw:", xRes, "iszh:", yRes)

    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer
    'objIE.Visible = True 'for debugging purposes

    'Google images search
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document

    'Count image results
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    MsgBox WorksheetFunction.Concat("'", searchTerm, "' returns ", valueResult.Length _
    , " images @ ", xRes, "x", yRes, "px.") 'returns number of loaded images on page

    'close the browser
    On Error Resume Next 'required when the browser is visible and I close it manually half way
    objIE.Quit

End Sub

它将internet对象导航到特定分辨率的谷歌图像搜索中,计算rg_s ID中的图像数量(这些是图像结果,而不是横幅图像等等)。然后,它将该计数作为消息框返回。(当我最终实现这一点时,我将返回工作表上列中的值,遍历30种不同的分辨率)

问题

此代码的主要问题是:

  • 它没有给出一个非常有用的计数。分辨率很低,因为它只计算加载的图像--这意味着大多数常见分辨率的搜索词,如1920x1080或1366x768,都会返回最大100幅图像。
  • 太慢了。对我来说,导航到页面,计数图像标签,这一切似乎很像在VBA中使用.Select。这就像一种人工方法,人类会做什么,因此效率很低。

解决方案

我可以想出一些方法来解决这些问题

  1. 数据的解析/获得更有用的计数
代码语言:javascript
复制
- Scroll down. If I can load more images, it's likely I'll be able to differentiate a little better. I found that scrolling as far down as I can (up to the _'Load more results'_ button) gives a cap of 400 not 100 - if there are at least that many images for a given resolution then I'm happy and I'll give it the top rank. Doesn't help with problem 2 though. Still, **how would I do that?**
- Narrow results. If 100 is returned, I could change the `filetype:` in the URL I send, like appending `filetype:png` to maybe halve the number of images returned, giving me a better spread in the 0-100 range. Not ideal though as I would have to iterate through multiple filetypes for some resolutions, slowing down the code and even then, not necessarily giving me what I want.
- Use Google's (or another search engine's) own values for this. I've asked this on various sites and in various forms, _is there any data on image count available direct from Google_ - i.e. without returning (and slowly loading) the images themselves. Like the `about 1,300,500 results in 0.03 seconds` for a regular search, only for images? If I could use a pre-calculated value each time that samples a bigger array than the 100 results, I might get a much more detailed picture.

  1. 慢度
代码语言:javascript
复制
- Try a different kind of HTTP request. Right now I open an instance of internet explorer and navigate to a page. That sounds very _human style_, I would prefer a computer style request. What I mean is, instead of using my laptop to trawl through images one by one, I get Google's supercomputers to do the leg work by only _asking_ for the count, as oppose to the images themselves. Not a clue how to do this. I'm aware of another 2 ways to search the web from within Excel; web query and `CreateObject("MSXML2.serverXMLHTTP")`. Don't know either of those much but if you think they would be a better way to go then I'll look into them more closely.

摘要

希望这还能继续下去,我认为我的思路应该相当清楚。关于如何向下滚动/加载更多图片/让Google返回一个计数而不是图像本身的实际答案将是最好的,关于追求什么的建议也是有用的。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-02-21 18:38:11

瓶颈不在for循环中。它是在打开浏览器并将其指向某个位置。如果您担心时间问题,那么您应该抓取一个已经打开到该页面的浏览器,在您对其进行所有搜索之前,不要关闭它。每次搜索时至少要节省2秒。我运行了以下代码并获得了这些时间:

打开并设置资源管理器的时间: 2.41秒。

数100张照片的时间(1):0.1秒。

数100张照片的时间(2):0.11秒。

我们的方法之间的差别是1/100秒。

此外,Google图像要求用户向下页面,以便调用下一个100张图片。如果您能够找到ajax或javascript语句来实现这一点,那么您将能够让它认为它有页面。这就是为什么你只能得到100张图片。

或者你可以打开一个浏览器,输入你的搜索词,然后向下页面,直到屏幕上有299张图片,这时你会发现一个按钮,上面写着“显示更多的图片”。然后截取打开的网页。

如果您正在运行多个搜索项,超出打开和关闭浏览器的时间瓶颈,则不包括图像。

代码语言:javascript
复制
Sub GoogleWithURL() 'requires Microsoft HTML Object Library
' https://www.google.com/search?q=St+Mary&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840
    Dim url As String
    Dim objIE As InternetExplorer 'special object variable representing the IE browser
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets("Sheet1")
    Dim currPage As HTMLDocument
    Dim StartTime As Double, SecondsElapsed As Double

    '****************************************
    '   Hard code url to search images of St Mary
    url = "https://www.google.com/search?q=St+Mary&source=lnms&tbm=" & _
            "isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840"

    StartTime = Timer
    Set objIE = New InternetExplorer
    objIE.Visible = True
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document
    SecondsElapsed = Round(Timer - StartTime, 2)
    Debug.Print "Time to open and set Explorer:  " & SecondsElapsed & " seconds."


    StartTime = Timer
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    For Each pic In valueResult
        counter = counter + 1
    Next pic
    SecondsElapsed = Round(Timer - StartTime, 2)

    Debug.Print "Time to Count " & counter & " Photos(1):  " & SecondsElapsed & " seconds."

    counter = 0
    StartTime = Timer
    Set valueResult = currPage.getElementsByTagName("IMG")
    For Each pic In valueResult
        If InStr(1, pic.className, "rg") > 0 Then
            counter = counter + 1
        End If
    Next pic
    SecondsElapsed = Round(Timer - StartTime, 2)

    Debug.Print "Time to Count " & counter & " Photos(2):  " & SecondsElapsed & " seconds."

    On Error Resume Next 'required when the browser is visible and I close it manually half way
    objIE.Quit

End Sub
票数 1
EN

Stack Overflow用户

发布于 2017-03-29 13:52:36

在问了几个问题之后,现在我觉得自己更聪明了,为此我做了一个UDF:

代码语言:javascript
复制
Public Function GOOGLE_COUNT(searchTerm As String, xRes As Long, yRes As Long, Optional timeout As Long = 10) As Long

    Dim url As String
    Dim objIE As InternetExplorer
    Dim currPage As HTMLDocument
    Dim stTimer As Double, tElapsed As Single
    Dim valueResult As IHTMLElementCollection

    'create URL to page with these image criteria
    url = "https://www.google.com/search?q=" & searchTerm & _
                        "&tbm=isch&source=lnt&tbs=isz:ex,iszw:" & xRes & ",iszh:" & yRes

    'initiating a new instance of Internet Explorer and asigning it to objIE
    Set objIE = New InternetExplorer

    'Google images search
    objIE.navigate url
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    Set currPage = objIE.document
    Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar")
    Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect
    stTimer = Timer
    'Scroll until bottom of page is in view
    Do Until elemRect.bottom > 0 Or tElapsed > timeout 'timeout after n seconds
        currPage.parentWindow.scrollBy 0, 10000
        Set elemRect = myDiv.getBoundingClientRect
        tElapsed = Timer - stTimer
    Loop
    myDiv.ScrollIntoView
    'Count the images
    Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
    GOOGLE_COUNT = valueResult.Length
    objIE.Quit
    
End Function

受雇于此:搜索1366:768大小的“圣玛丽”

代码语言:javascript
复制
=GOOGLE_COUNT("St. Mary", 1366, 768)

或者是10秒超时(如果10秒过去了,搜索就停止滚动,只对加载的图像进行计数)。

代码语言:javascript
复制
=GOOGLE_COUNT("St. Mary", 1366, 768, 10)

我在another question中解释了滚动是如何工作的,它现在是混乱的,但功能强大。

重要信息:

正如约翰·马金斯所指出的,重要的时间是在装货上,不算在内。特别是打开和关闭InternetExplorer。因此,为了避免大量的重新计算时间;如果(像我一样)您想检查一个以上的术语/分辨率,将这段代码放在宏中,而不是函数中(如果您认为我应该发布这个)。此UDF仅用于一次性搜索。

希望这是有用的,我想我应该重新审视这个问题,把我得到的答案贴出来。

最后注:

  1. 您的计算机(可能)没有崩溃,它的功能只是计算。
  2. 对于搜索术语,输入在google搜索栏中输入的任何内容。“美洲虎”(Jaguar-car)返回动物的图像,而不是汽车公司。
  3. 结果是一个数字0-400;0-399是实际计数的图像数(只要你设置的超时足够大-自动是10s)。400是最大的,所以在那个分辨率下,可能有400多张图片可用于该术语。
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42373117

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档