目前,我正试图从Microsoft (其GitHub版本)解析一个表,以获得适当的PowerShell对象。我将共享相关的代码部分,以便您可以测试它。它确实解析了我想要的东西,但是我希望结果已经被修剪了(没有前导的尾随空格或换行)。我还必须得到"CNG键隔离“的结果,它具有不同的格式。仅对于该数据块,我的RegEx包含换行,而我没有让它正常工作。我知道我可以在PowerShell中完成RegEx之后的一些解析,但是我希望能够更好地使用RegEx。
我尚未优化的RegEx如下所示
(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)您可以在这里测试它:https://regex101.com/r/xQDRCO/1
它基本上应该为每个服务获取一个数据块,并尝试获取
"ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"不管他们是什么顺序,或者他们中的一个失踪了。"ServiceTitle"是一种特殊的东西,必须在那里。
下面是我目前测试的PowerShell代码:
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$RegExMatches = [Regex]::Matches($RequestData.content,'(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s+Name\s+\|\s+Description\s+\|(?:[\r\n\s\|\-\*]+Service name[\|\*\s]+(?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Description[\|\*\s]+(?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Installation[\|\*\s]+(?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Startup type[\|\*\s]+(?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Recommendation[\|\*\s]+(?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*]+Comments[\|\*\s]+(?<Comments>[^\|]*?)(?: ?\|))*)',[System.Text.RegularExpressions.RegexOptions]::Multiline)
$FullList = @()
foreach ($entry in $RegExMatches) {$ServiceAsObject = [pscustomobject]@{};foreach ($field in $fields) {$ServiceAsObject | Add-Member -MemberType NoteProperty -Name $field -Value $entry.Groups[$field].value};$FullList += $ServiceAsObject}
$FullList[15..17] # three items to see what problem i have with "CNG Key Isolation"我不会经常使用更大的RegEx,所以可以随意给我一些反馈来提高自己。
谢谢你,安迪尔
发布于 2022-01-13 14:25:11
这可能不是您想要的,但您可以执行如下操作来输出自定义对象的数组:
$output = switch -regex ($requestdata.content -split '\r?\n') {
'^##\s' {
# tracking empty lines since there is one under the service title
# start new hash table when a new service is found
# remove ## from service title names
$emptyLineCount = 0
$hash = [ordered]@{}
$hash.ServiceTitle = $_ -replace '^##\s'
}
'\| \*\*' {
# split on | and surrounding spaces
# replace ** so name is cleaner
if ($hash.ServiceTitle) {
$key,$value = ($_ -split '\s*\|\s*' -replace '\*\*')[1,2]
$hash[$key] = $value
}
}
'^$' {
# when second empty line is reached in a service block, output object
if ($hash.ServiceTitle -and ++$emptyLineCount -eq 2) {
[pscustomobject]$hash
}
}
}
# Finding a service by title
$output | Where ServiceTitle -eq 'CNG Key Isolation'拆分内容会形成一个行数组,这对我来说更容易使用switch语句。
如果存在数据不一致,使用更纯的regex解决方案将使事情变得更加脆弱。用于CNG密钥隔离的数据块在每一行的末尾都缺少|,并且是唯一这样的数据块。因此,现在您必须匹配特例或修复数据。
$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$regexString = '(?m)^##\s(?<ServiceTitle>.*)$(?s).*?\*\*Service name\*\* \| (?<ServiceName>.*?(?=\s+\|)).*?\*\*Description\*\* \| (?<Description>.*?(?=\s+\|)).*?\*\*Installation\*\* \| (?<Installation>.*?(?=\s+\|)).*?\*\*Startup type\*\* \| (?<StartupType>.*?(?=\s+\|)).*?\*\*Recommendation\*\* \| (?<Recommendation>.*?(?=\s+\|)).*?\*\*Comments\*\* \| (?<Comments>.*?(?=\s+\|))'
$out = $RequestData.Content |
Select-String -Pattern $regexString -AllMatches |
Foreach-Object { $_.Matches | Foreach-Object {
$hash = [ordered]@{}
foreach ($field in $fields) {
$hash.$field = $_.Groups.where{$_.Name -eq $field}.Value}
[pscustomobject]$hash
}
}发布于 2022-01-13 14:41:06
假设您的$RequestData.content中有所有的文本,那么我不会尝试创建一个大型regex来将其解析为可用的对象,而是这样做:
# first split the tables from the rest of the text and work on the table lines only
$result = ($RequestData.content -split '(?m)^The following tables.*:')[-1].Trim() -split '(?m)^## ' |
Where-Object { $_ -match '\S' } |
ForEach-Object {
# split each block to parse out the title and the table data
$title, $table = ($_.Trim() -split '(\r?\n){2}', 2).Trim()
# now remove the markdown stuff from the data and convert it using ConvertFrom-Csv
$data = (($table -replace '(?m)^\|--\|--\||[*]{2}|^\||\|$' -replace '\s\|\s', '|') -split '\r?\n' -ne '').Trim() | ConvertFrom-Csv -Delimiter '|'
# set up an ordered Hashtable to store the data
$hash = [ordered]@{ServiceTitle = $title}
foreach ($item in $data) {
$hash[$item.Name] = $item.Description
}
# output real objects
[PsCustomObject]$hash
}
$resulthttps://stackoverflow.com/questions/70696733
复制相似问题