首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Powershell从给定的url获取网站元数据,如标题、描述

使用Powershell从给定的url获取网站元数据,如标题、描述
EN

Stack Overflow用户
提问于 2017-01-11 23:37:34
回答 3查看 7.7K关注 0票数 0

如何使用Powershell从给定的url检索网站元数据(如标题、描述、关键字)?

例如:给定以下网址

www.amazon.com 输入:

输出

代码语言:javascript
复制
title: "Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more",
description: "Online shopping from the earth's biggest selection of books, magazines, music, DVDs, videos, electronics, computers, software, apparel & accessories, shoes, jewelry, tools & hardware, housewares, furniture, sporting goods, beauty & personal care, broadband & dsl, gourmet food & just about anything else.",
keyword: "Amazon, Amazon.com, Books, Online Shopping, Book Store, Magazine, Subscription, Music, CDs, DVDs, Videos, Electronics, Video Games, Computers, Cell Phones, Toys, Games, Apparel, Accessories, Shoes, Jewelry, Watches, Office Products, Sports & Outdoors, Sporting Goods, Baby Products, Health, Personal Care, Beauty, Home, Garden, Bed & Bath, Furniture, Tools, Hardware, Vacuums, Outdoor Living, Automotive Parts, Pet Supplies, Broadband, DSL"

www.youtube.com 输入:

输出

代码语言:javascript
复制
title: YouTube
description: Enjoy the videos and music you love, upload original content and share it all with friends, family and the world on YouTube.
keywords: video, sharing, camera phone, video phone, free, upload
EN

回答 3

Stack Overflow用户

发布于 2017-01-12 01:32:23

Note --这只适用于PowerShell 5.1及更低版本。

@StephenP说

不能保证你去的网站会有你想要的数据以任何实际的方式公开。您可以使用Invoke-WebRequest和Invoke-RestMethod轻松检索网页,但随后需要解析返回的头/数据。

此外,也不能保证网站不会试图阻止你正在做的事情。

下面是一个使用.NET HTML进行解析的示例。@tim给出了一个使用RegEx查找相同信息的示例,但正如他所提到的,这取决于RegEx是否正确。另一方面,有时HTML DOM也不能解析页面。

代码语言:javascript
复制
# First retrieve the website
$result = Invoke-webrequest -Uri http://www.youtube.com/ -Method Get
$resultTable = @{}

# Get the title
$resultTable.title = $result.ParsedHtml.title

# Get the HTML Tag
$HtmlTag = $result.ParsedHtml.childNodes | Where-Object {$_.nodename -eq 'HTML'} 

# Get the HEAD Tag
$HeadTag = $HtmlTag.childNodes | Where-Object {$_.nodename -eq 'HEAD'}

# Get the Meta Tags
$MetaTags = $HeadTag.childNodes| Where-Object {$_.nodename -eq 'META'}

# You can view these using $metaTags | select outerhtml | fl 
# Get the value on content from the meta tag having the attribute with the name keywords
$resultTable.keywords = $metaTags  | Where-Object {$_.name -eq 'keywords'} | Select-Object -ExpandProperty content

# Do the same for description
$resultTable.description = $metaTags  | Where-Object {$_.name -eq 'description'} | Select-Object -ExpandProperty content

# Return the table we have built as an object
Write-Output New-Object -TypeName PSCustomObject -Property $resultTable
票数 4
EN

Stack Overflow用户

发布于 2017-01-12 01:31:57

您可以使用调用-WebRequest,然后使用regex匹配所需的字符串:

代码语言:javascript
复制
$response = Invoke-WebRequest -Uri www.amazon.com -UseBasicParsing

PS C:\> $response.Content -match "<title>(?<title>.*)</title>" | out-null

$matches['title']
Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more

PS C:\> $response.Content -match "<meta name=`"description`" content=`"(?<description>.*)`">" | out-null

$matches['description']
Online shopping from the earth's biggest selection of books, magazines, music, DVDs, videos, electronics, computers, software, apparel & accessories, shoes, jewelry, tools & hardware, housewares, furniture, sporting goods, beauty & personal care, broadband & dsl, gourmet food & just about anything else.

PS C:\> $response.Content -match "<meta name=`"keywords`" content=`"(?<keywords>.*)`">" | out-null

$matches['keywords']
Amazon, Amazon.com, Books, Online Shopping, Book Store, Magazine, Subscription, Music, CDs, DVDs, Videos, Electronics, Video Games, Computers, Cell Phones, Toys, Games, Apparel, Accessories, Shoes, Jewelry, Watches, Office Products, Sports & Outdoors, Sporting Goods, Baby Products, Health, Personal Care, Beauty, Home, Garden, Bed & Bath, Furniture, Tools, Hardware, Vacuums, Outdoor Living, Automotive Parts, Pet Supplies, Broadband, DSL

这将取决于所有使用相同模式的网站的元字段。例如,上面的内容不适用于Stack溢出的站点,因为它们用'/>‘关闭了它们的元字段。

票数 3
EN

Stack Overflow用户

发布于 2017-01-12 01:27:08

不能保证你去的网站会有你想要的数据以任何实际的方式公开。您可以使用Invoke-WebRequest和Invoke-RestMethod轻松检索网页,但随后需要解析返回的头/数据。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41602754

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档