文章/答案/技术大牛

发布

问尝试从网页中提取信息
EN

Stack Overflow用户

提问于 2018-10-13 03:48:28

回答 2查看 56关注 0票数 1

我正在尝试从一个网站中提取数据。在我的示例中，我在Armorgames.com上运行搜索，查找搜索词idle。从那里，我想提取每个游戏的名称，并将其放入一个csv文件中，以供以后使用。我的代码：

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle' 
($SearchResult.ParsedHtml.getElementsByTagName('H5') | Where { $_.pathname -like '/play*'})

不幸的是，这不会输出任何结果。我可以使用以下命令查看属性名称：

$SearchResult.ParsedHtml.getElementsByTagName('H5')

使用标签'a‘，我可以找到路径名包含’play‘的游戏。但我在过滤结果，然后将结果输出到文件时遇到了问题

powershell

html-object

powershell-v6.0

回答 2

Stack Overflow用户

发布于 2018-10-13 04:16:17

$SearchResult.ParsedHtml.getElementsByTagName('a') | where-Object -Property pathname -Like 'play/*'

# select property pathname
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property pathname

# select property title
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property title -Unique

票数 0

Stack Overflow用户

发布于 2018-10-16 06:33:57

与PowerShell核心(v6.0)兼容的网页抓取代码，它也应该与Windows PowerShell一起工作，依赖于regex with the -match operator (因为ParsedHtml属性在核心上不可用)：

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle'
$GameNames = ($SearchResult.Content.split('<') | 
    where {$_ -match '^a href.*play.*\ title=.*>[A-Z].*'}) -replace '.*>'
$GameNames

输出如下所示：

Artist Idle
Hero Simulator: Idle Adventures
Idle Farmer
Idle Online Universe
Idle Sword
Idle Web Tycoon
Legendary Journey Idle
NGU IDLE
Religious Idle
Zombidle

现在，您已经拥有了所需名称的数组，您应该能够使用所需的任何其他信息创建一个CSV。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52786141

复制

相似问题

问尝试从网页中提取信息
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试从网页中提取信息EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试从网页中提取信息
EN