我在vba中编写了一些代码,以便从网页获得指向下一页的所有链接。下一页链接的最高数量是255。运行我的脚本,我会得到6906链接中的所有链接。这意味着循环一次又一次地运行,而我正在覆盖一些内容。过滤掉重复的链接,我可以看到254个唯一的链接在那里。我在这里的目标不是硬编码最高的页码到链接的迭代。以下是我所尝试的:
Sub YifyLink()
Const link = "https://www.yify-torrent.org/search/1080p/"
Dim http As New XMLHTTP60, html As New HTMLDocument, htm As New HTMLDocument
Dim x As Long, y As Long, item_link as String
With http
.Open "GET", link, False
.send
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("pager")(0).getElementsByTagName("a")
If InStr(post.innerText, "Last") Then
x = Split(Split(post.href, "-")(1), "/")(0)
End If
Next post
For y = 0 To x
item_link = link & "t-" & y & "/"
With http
.Open "GET", item_link, False
.send
htm.body.innerHTML = .responseText
End With
For Each posts In htm.getElementsByClassName("pager")(0).getElementsByTagName("a")
I = I + 1: Cells(I, 1) = posts.href
Next posts
Next y
End Sub链接所在的元素:
<div class="pager"><a href="/search/1080p/" class="current">1</a> <a href="/search/1080p/t-2/">2</a> <a href="/search/1080p/t-3/">3</a> <a href="/search/1080p/t-4/">4</a> <a href="/search/1080p/t-5/">5</a> <a href="/search/1080p/t-6/">6</a> <a href="/search/1080p/t-7/">7</a> <a href="/search/1080p/t-8/">8</a> <a href="/search/1080p/t-9/">9</a> <a href="/search/1080p/t-10/">10</a> <a href="/search/1080p/t-11/">11</a> <a href="/search/1080p/t-12/">12</a> <a href="/search/1080p/t-13/">13</a> <a href="/search/1080p/t-14/">14</a> <a href="/search/1080p/t-15/">15</a> <a href="/search/1080p/t-16/">16</a> <a href="/search/1080p/t-17/">17</a> <a href="/search/1080p/t-18/">18</a> <a href="/search/1080p/t-19/">19</a> <a href="/search/1080p/t-20/">20</a> <a href="/search/1080p/t-21/">21</a> <a href="/search/1080p/t-22/">22</a> <a href="/search/1080p/t-23/">23</a> <a href="/search/1080p/t-2/">Next</a> <a href="/search/1080p/t-255/">Last</a> </div>我得到的结果(部分):
about:/search/1080p/t-20/
about:/search/1080p/t-21/
about:/search/1080p/t-22/
about:/search/1080p/t-23/
about:/search/1080p/t-255/发布于 2017-07-28 18:52:55
这个想法应该是在循环中刮页,如果不是真的,找一些比较的东西,然后退出循环。
这可能是这样的,即根据字典检查密钥,或者检查元素是否存在,或者检查可能特定于您的问题的任何其他逻辑。
例如,在这里,您的问题是,站点继续显示后一个页面的255页。这对我们来说是条线索。我们可以比较属于页(n)的元素和属于页(n-1)的元素。
例如,如果256页中的元素与第255页中的元素相同,则退出循环/sub。请参阅下面的示例代码:
Sub yify()
Const mlink = "https://www.yify-torrent.org/search/1080p/t-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object, posts As Object
Dim pageno As Long, rowno As Long
pageno = 1
rowno = 1
Do
With http
.Open "GET", mlink & pageno & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
If Cells(rowno, 1) = posts(17).getElementsByTagName("a")(0).innerText Then Exit Do
For Each post In posts
With post.getElementsByTagName("div")
If .Length Then
rowno = rowno + 1
Cells(rowno, 1) = .Item(0).innerText
End If
End With
Next post
Debug.Print "pageno: " & pageno & " completed."
pageno = pageno + 1
Loop
End Subhttps://stackoverflow.com/questions/45362363
复制相似问题