有谁知道如何获取第一个div的子节点的链接?
该页面如下所示:
<div id="id1" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/1">foo.com/1</a></h3>
</div>
</div>
<div id="id2" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/2">foo.com/2</a></h3>
</div>
</div>
<div id="id3" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/3">foo.com/3</a></h3>
</div>
</div>我想得到第一个div,但每次导航时id都会改变。
因此,我需要一个代码来获取页面上的第一个div,然后获取孩子的链接。然后,WebBrowser可以转到该链接。
这是我尝试过的:
WebBrowser1.Navigate("http://foo.com/home")
WebBrowser1.发布于 2015-01-13 03:45:36
下面的代码检索第一个<h3><a href="http://foo.com/1">foo.com/1</a></h3>中的链接
Dim wc As New WebClient
Dim html As String = wc.DownloadString([.URL link.])
Dim txt As String = html.ToString()
Dim re1 As String = ".*?" 'Non-greedy match on filler
Dim re2 As String = "(http)" 'Word 1
Dim re3 As String = "(:)" 'Any Single Character 1
Dim re4 As String = "(\/)" 'Any Single Character 2
Dim re5 As String = "((?:\/[\w\.\-]+)+)" 'Unix Path 1
Dim r As Regex = New Regex(re1 + re2 + re3 + re4 + re5, RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match = r.Match(txt)
If (m.Success) Then
Dim s1 = m.Groups(1)
Dim s2 = m.Groups(2)
Dim s3 = m.Groups(3)
Dim s4 = m.Groups(4)
Dim url As String = m.Groups(1).ToString() + m.Groups(2).ToString() + m.Groups(3).ToString() + m.Groups(4).ToString()
[.do whatever with the URL here.]
End If修改了在线正则表达式工具this is the source.中的txt2re代码
请注意,您需要访问Net和RegularExpressions名称空间,因此还需要:
Imports System.Text.RegularExpressions
Imports System.Nethttps://stackoverflow.com/questions/27907903
复制相似问题