我试图从网站中提取一些数据,但由于我是网络的新手,所以在标签名、类代码和ID中混淆了。我对此只有基本的了解。我希望在下面复制数据,如果数据不存在,则应该将单元格保留为空白,代码需要在下一个值中移动。
Class="container size" - 5*5,5*10 kind of value
Class="description" - Standard in this case also need to copy like Drive-up Access
Class="offer1" & "offer2" - Call for Availability
Class="price"我试着设计一个代码,但无法准确判断需要选择哪个标签名,下面是我的代码,请帮助我复制这些数据。
Dim ie As New InternetExplorer, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 "" & Sheets("Home").Range("C3").Text
While .Busy Or .readyState < 4: DoEvents: Wend
Sheets("Unit Data").Select
Dim listings As Object, listing As Object, headers(), results()
Dim r As Long, list As Object, item As Object
headers = Array("size", "features")
Set list = .document.getElementsByClassName("units-table main")
'.unit_size medium, .features, .promo_offers, .board_rate_wrapper p, .board_rate
Dim rowCount As Long
rowCount = .document.querySelectorAll(".units-table main li").Length
ReDim results(1 To rowCount, 1 To UBound(headers) + 1)
For Each listing In list
For Each item In listing.getElementsByTagName("li")
r = r + 1
On Error Resume Next
results(r, 1) = item.getElementsByClassName("container size")(0).innerText
results(r, 2) = item.getElementsByClassName("description")(0).innerText
On Error GoTo 0
Next
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With发布于 2019-04-06 16:50:13
XHR:
所有信息都可以通过XMLHTTP (XHR)请求获得--比打开浏览器要快得多。
我首先使用.main li[class]的css选择器检索行计数。"."是类选择器,li是类型选择器,[class]是属性选择器。中间的空格" "是后代组合子。它指定要检索具有类属性的所有li标记/类型元素,其父元素的类名为main。
这一匹配情况如下:

如您所见,这给出了行数;要为结果集检索信息的父li元素的数量。
li元素的这个集合由querySelectorAll作为nodeList返回。我不能循环这个列表,将getElementsByClassName / querySelector应用于单个节点,因为li元素不公开我可以使用的任何方法。
现在,由于我没有使用浏览器,所以我不得不依赖于HTMLDocument对象可用的方法。与浏览器不同的是,当通过VBA自动化时,我无法访问它们支持的有限的伪类选择器,这将允许我使用选择器语法(如:nth-of-type )来访问各个行。这是一个恼人的限制,网络刮刮与VBA。
那么,我们能做什么呢?在这个例子中,我可以将每个节点的innerHTML转储到另一个HTMLDocument变量html2中,这样就可以访问该对象的querySelector/querySelectorAll方法。然后,HTML将仅限于当前的li。
如果我们查看所讨论的HTML:

我们可以看到,li元素是通用的兄弟关系。他们并排坐在一起。当我循环我的nodeList listings时,我将innerHTML从当前节点传输到html2;第二个HTMLDocument变量。
值得注意的是,我可能会使用children降低每个列表,例如:
listings.item(i).Children(2)......然后,我可以在newLines等上进行拆分,以便访问所有信息。不过,我认为我的方法更快,更健壮。
VBA:
Option Explicit
Public Sub GetInfo()
Dim ws As Worksheet, html As HTMLDocument, s As String
Const URL As String = "https://www.neighborhoodselfstorage.net/self-storage-delmar-md-f1426"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
html.body.innerHTML = s
Dim headers(), results(), listings As Object, amenities As String
headers = Array("Size", "Description", "Amenities", "Offer1", "Offer2", "RateType", "Price")
Set listings = html.querySelectorAll(".main li[class]")
Dim rowCount As Long, numColumns As Long, r As Long, c As Long
Dim icons As Object, icon As Long, amenitiesInfo(), i As Long, item As Long
rowCount = listings.Length
numColumns = UBound(headers) + 1
ReDim results(1 To rowCount, 1 To numColumns)
Dim html2 As HTMLDocument
Set html2 = New HTMLDocument
For item = 0 To listings.Length - 1
r = r + 1
html2.body.innerHTML = listings.item(item).innerHTML
'size,description, amenities,specials offer1 offer2, rate type, price
results(r, 1) = Trim$(html2.querySelector(".size").innerText)
results(r, 2) = Trim$(html2.querySelector(".description").innerText)
Set icons = html2.querySelectorAll("i[title]")
ReDim amenitiesInfo(0 To icons.Length - 1)
For icon = 0 To icons.Length - 1
amenitiesInfo(icon) = icons.item(icon).getAttribute("title")
Next
amenities = Join$(amenitiesInfo, ", ")
results(r, 3) = amenities
results(r, 4) = html2.querySelector(".offer1").innerText
results(r, 5) = html2.querySelector(".offer2").innerText
results(r, 6) = html2.querySelector(".rate-label").innerText
results(r, 7) = html2.querySelector(".price").innerText
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End SubInternet:
假设没有从给定的url重定向。在这里,我使用:n个类型的伪类选择器来针对清单的每一行。这些行是保存每个框列表的信息的li (list)元素。我构建了一个css选择器字符串,该字符串指定行,然后指定我所要查找的行中的元素。我将该字符串传递给querySelector,或返回匹配元素/s的querySelectorAll。
Option Explicit
Public Sub UseIE()
Dim ie As New InternetExplorerm, ws As Worksheet
Const Url As String = "https://www.neighborhoodselfstorage.net/self-storage-delmar-md-f142"
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
.Navigate2 Url
While .Busy Or .readyState < 4: DoEvents: Wend
Dim headers(), results(), listings As Object, listing As Object, amenities As String
headers = Array("Size", "Description", "Amenities", "Offer1", "Offer2", "RateType", "Price")
Set listings = .document.querySelectorAll(".main li[class]")
Dim rowCount As Long, numColumns As Long, r As Long, c As Long
Dim icons As Object, icon As Long, amenitiesInfo(), i As Long
rowCount = listings.Length
numColumns = UBound(headers) + 1
ReDim results(1 To rowCount, 1 To numColumns)
For Each listing In listings
r = r + 1
'size,description, amenities,specials offer1 offer2, rate type, price
With .document
results(r, 1) = Trim$(.querySelector(".main li:nth-of-type(" & r & ") .size").innerText)
results(r, 2) = Trim$(.querySelector(".main li:nth-of-type(" & r & ") .description").innerText)
Set icons = .querySelectorAll("." & Join$(Split(listing.className, Chr$(32)), ".") & ":nth-of-type(" & r & ") i[title]")
ReDim amenitiesInfo(0 To icons.Length - 1)
For icon = 0 To icons.Length - 1
amenitiesInfo(icon) = icons.item(icon).getAttribute("title")
Next
amenities = Join$(amenitiesInfo, ",")
results(r, 3) = amenities
results(r, 4) = .querySelector(".main li:nth-of-type(" & r & ") .offer1").innerText
results(r, 5) = .querySelector(".main li:nth-of-type(" & r & ") .offer2").innerText
results(r, 6) = .querySelector(".main li:nth-of-type(" & r & ") .rate-label").innerText
results(r, 7) = .querySelector(".main li:nth-of-type(" & r & ") .price").innerText
End With
Next
.Quit
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub引用(VBE > Tools >Reference):
https://stackoverflow.com/questions/55549841
复制相似问题