我一直在利用Excel中的HTML,试图近似不同分辨率下的常见图像。我希望能得到一些动态的信息--用户输入一个搜索项,代码循环通过一组预定义的图像分辨率,在指定的分辨率之间对该搜索项的常见图像进行排序。
但是,第一步是获得一种可靠(且快速)的方法,以返回特定分辨率下的图像数量。我写了这段代码:
Sub GoogleWithURL() 'requires Microsoft HTML Object Library
Dim url As String, searchTerm As String
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("sheet1")
Dim currPage As HTMLDocument
Dim xRes As Integer, yRes As Integer
With ws
xRes = .Range("XRes")
yRes = .Range("YRes")
searchTerm = .Range("search")
End With
'create URL to page with these image criteria
url = WorksheetFunction.Concat("https://www.google.com/search?q=", searchTerm, _
"&tbm=isch&source=lnt&tbs=isz:ex,iszw:", xRes, "iszh:", yRes)
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'objIE.Visible = True 'for debugging purposes
'Google images search
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set currPage = objIE.document
'Count image results
Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
MsgBox WorksheetFunction.Concat("'", searchTerm, "' returns ", valueResult.Length _
, " images @ ", xRes, "x", yRes, "px.") 'returns number of loaded images on page
'close the browser
On Error Resume Next 'required when the browser is visible and I close it manually half way
objIE.Quit
End Sub它将internet对象导航到特定分辨率的谷歌图像搜索中,计算rg_s ID中的图像数量(这些是图像结果,而不是横幅图像等等)。然后,它将该计数作为消息框返回。(当我最终实现这一点时,我将返回工作表上列中的值,遍历30种不同的分辨率)
问题
此代码的主要问题是:
.Select。这就像一种人工方法,人类会做什么,因此效率很低。解决方案
我可以想出一些方法来解决这些问题
- Scroll down. If I can load more images, it's likely I'll be able to differentiate a little better. I found that scrolling as far down as I can (up to the _'Load more results'_ button) gives a cap of 400 not 100 - if there are at least that many images for a given resolution then I'm happy and I'll give it the top rank. Doesn't help with problem 2 though. Still, **how would I do that?**
- Narrow results. If 100 is returned, I could change the `filetype:` in the URL I send, like appending `filetype:png` to maybe halve the number of images returned, giving me a better spread in the 0-100 range. Not ideal though as I would have to iterate through multiple filetypes for some resolutions, slowing down the code and even then, not necessarily giving me what I want.
- Use Google's (or another search engine's) own values for this. I've asked this on various sites and in various forms, _is there any data on image count available direct from Google_ - i.e. without returning (and slowly loading) the images themselves. Like the `about 1,300,500 results in 0.03 seconds` for a regular search, only for images? If I could use a pre-calculated value each time that samples a bigger array than the 100 results, I might get a much more detailed picture.
- Try a different kind of HTTP request. Right now I open an instance of internet explorer and navigate to a page. That sounds very _human style_, I would prefer a computer style request. What I mean is, instead of using my laptop to trawl through images one by one, I get Google's supercomputers to do the leg work by only _asking_ for the count, as oppose to the images themselves. Not a clue how to do this. I'm aware of another 2 ways to search the web from within Excel; web query and `CreateObject("MSXML2.serverXMLHTTP")`. Don't know either of those much but if you think they would be a better way to go then I'll look into them more closely.
摘要
希望这还能继续下去,我认为我的思路应该相当清楚。关于如何向下滚动/加载更多图片/让Google返回一个计数而不是图像本身的实际答案将是最好的,关于追求什么的建议也是有用的。
发布于 2017-02-21 18:38:11
瓶颈不在for循环中。它是在打开浏览器并将其指向某个位置。如果您担心时间问题,那么您应该抓取一个已经打开到该页面的浏览器,在您对其进行所有搜索之前,不要关闭它。每次搜索时至少要节省2秒。我运行了以下代码并获得了这些时间:
打开并设置资源管理器的时间: 2.41秒。
数100张照片的时间(1):0.1秒。
数100张照片的时间(2):0.11秒。
我们的方法之间的差别是1/100秒。
此外,Google图像要求用户向下页面,以便调用下一个100张图片。如果您能够找到ajax或javascript语句来实现这一点,那么您将能够让它认为它有页面。这就是为什么你只能得到100张图片。
或者你可以打开一个浏览器,输入你的搜索词,然后向下页面,直到屏幕上有299张图片,这时你会发现一个按钮,上面写着“显示更多的图片”。然后截取打开的网页。
如果您正在运行多个搜索项,超出打开和关闭浏览器的时间瓶颈,则不包括图像。
Sub GoogleWithURL() 'requires Microsoft HTML Object Library
' https://www.google.com/search?q=St+Mary&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840
Dim url As String
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1")
Dim currPage As HTMLDocument
Dim StartTime As Double, SecondsElapsed As Double
'****************************************
' Hard code url to search images of St Mary
url = "https://www.google.com/search?q=St+Mary&source=lnms&tbm=" & _
"isch&sa=X&ved=0ahUKEwj99ay14aPSAhWDMSYKHadiCjkQ_AUICSgC&biw=1600&bih=840"
StartTime = Timer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set currPage = objIE.document
SecondsElapsed = Round(Timer - StartTime, 2)
Debug.Print "Time to open and set Explorer: " & SecondsElapsed & " seconds."
StartTime = Timer
Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
For Each pic In valueResult
counter = counter + 1
Next pic
SecondsElapsed = Round(Timer - StartTime, 2)
Debug.Print "Time to Count " & counter & " Photos(1): " & SecondsElapsed & " seconds."
counter = 0
StartTime = Timer
Set valueResult = currPage.getElementsByTagName("IMG")
For Each pic In valueResult
If InStr(1, pic.className, "rg") > 0 Then
counter = counter + 1
End If
Next pic
SecondsElapsed = Round(Timer - StartTime, 2)
Debug.Print "Time to Count " & counter & " Photos(2): " & SecondsElapsed & " seconds."
On Error Resume Next 'required when the browser is visible and I close it manually half way
objIE.Quit
End Sub发布于 2017-03-29 13:52:36
在问了几个问题之后,现在我觉得自己更聪明了,为此我做了一个UDF:
Public Function GOOGLE_COUNT(searchTerm As String, xRes As Long, yRes As Long, Optional timeout As Long = 10) As Long
Dim url As String
Dim objIE As InternetExplorer
Dim currPage As HTMLDocument
Dim stTimer As Double, tElapsed As Single
Dim valueResult As IHTMLElementCollection
'create URL to page with these image criteria
url = "https://www.google.com/search?q=" & searchTerm & _
"&tbm=isch&source=lnt&tbs=isz:ex,iszw:" & xRes & ",iszh:" & yRes
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'Google images search
objIE.navigate url
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Set currPage = objIE.document
Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar")
Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect
stTimer = Timer
'Scroll until bottom of page is in view
Do Until elemRect.bottom > 0 Or tElapsed > timeout 'timeout after n seconds
currPage.parentWindow.scrollBy 0, 10000
Set elemRect = myDiv.getBoundingClientRect
tElapsed = Timer - stTimer
Loop
myDiv.ScrollIntoView
'Count the images
Set valueResult = currPage.getElementById("rg_s").getElementsByTagName("IMG")
GOOGLE_COUNT = valueResult.Length
objIE.Quit
End Function受雇于此:搜索1366:768大小的“圣玛丽”
=GOOGLE_COUNT("St. Mary", 1366, 768)或者是10秒超时(如果10秒过去了,搜索就停止滚动,只对加载的图像进行计数)。
=GOOGLE_COUNT("St. Mary", 1366, 768, 10)我在another question中解释了滚动是如何工作的,它现在是混乱的,但功能强大。
重要信息:
正如约翰·马金斯所指出的,重要的时间是在装货上,不算在内。特别是打开和关闭InternetExplorer。因此,为了避免大量的重新计算时间;如果(像我一样)您想检查一个以上的术语/分辨率,将这段代码放在宏中,而不是函数中(如果您认为我应该发布这个)。此UDF仅用于一次性搜索。
希望这是有用的,我想我应该重新审视这个问题,把我得到的答案贴出来。
最后注:
https://stackoverflow.com/questions/42373117
复制相似问题