首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >创建自定义对象时,如何停止从两个类似的命名为XML节点获取对象?

创建自定义对象时,如何停止从两个类似的命名为XML节点获取对象?
EN

Stack Overflow用户
提问于 2022-01-13 16:18:35
回答 1查看 46关注 0票数 0

我正试图解析几个RSS新闻提要,我将在以后根据我正在寻找的内容进行过滤。每个提要有一个稍微不同的XML,但通常有一个标题、描述、链接和pubDate。有些使用CDATA部分,而有些不使用CDATA部分,因此我为使用CDATA部分的人合并了和if语句。我正在尝试写一个程序来匹配所有的。下面是一个让我头疼的XML示例:

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title><![CDATA[ABC7 RSS Feed]]></title>
    <link><![CDATA[https://abc7news.com/feed]]></link>
    <lastBuildDate><![CDATA[Thu, 13 Jan 2022 15:49:04 +0000]]></lastBuildDate>
    <pubDate><![CDATA[Thu, 13 Jan 2022 15:49:04 +0000]]></pubDate>
    <description>Keep up with news from your local ABC station.</description>
    <copyright>Copyright 2022 ABC Inc., KGO-TV San Francisco</copyright>
    <managingEditor>KGO-TVWebTeam@email.disney.com(KGO-TV San Francisco)</managingEditor>
    <webMaster>KGO-TVWebTeam@email.disney.com(KGO-TV San Francisco)</webMaster>
    <language><![CDATA[en]]></language>
    <item>
      <title><![CDATA[Biden gives COVID response update; administration to deploy military teams to hospitals | LIVE]]></title>
      <description><![CDATA[Starting next week, 1,000 military medical personnel will begin arriving to help mitigate staffing crunches at hospitals across the country. ]]></description>
      <pubDate><![CDATA[Thu, 13 Jan 2022 15:38:02 +0000]]></pubDate>
      <link><![CDATA[https://abc7news.com/us-covid-biden-speech-today-hospitalizations/11462828/]]></link>
      <type><![CDATA[post]]></type>
      <guid><![CDATA[https://abc7news.com/us-covid-biden-speech-today-hospitalizations/11462828/]]></guid>
      <dc:creator><![CDATA[AP]]></dc:creator>
      <media:keywords><![CDATA[us covid, biden covid, biden speech today, covid hospitalizations, omicron variant, us hospitals, covid cases, covid omicron, biden military medical teams]]></media:keywords>
      <category><![CDATA[Health & Fitness,omicron variant,Coronavirus,military,joe biden,hospitals,u.s. & world]]></category>
      <guid isPermaLink="false">health/live-biden-highlighting-federal-surge-to-help-weather-omicron/11462828/</guid>
    </item>
    <item>
      <title><![CDATA[Massive backup on Bay Bridge after early morning crash]]></title>
      <description><![CDATA[A massive backup continues on the Bay Bridge after an earlier multi-vehicle crash past Treasure Island.]]></description>
      <pubDate><![CDATA[Thu, 13 Jan 2022 15:30:15 +0000]]></pubDate>
      <link><![CDATA[https://abc7news.com/bay-bridge-crash-traffic-accident-sf-commute/11463119/]]></link>
      <type><![CDATA[post]]></type>
      <guid><![CDATA[https://abc7news.com/bay-bridge-crash-traffic-accident-sf-commute/11463119/]]></guid>
      <dc:creator><![CDATA[KGO]]></dc:creator>
      <media:title><![CDATA[Crash triggers massive backup on Bay Bridge]]></media:title>
      <media:description><![CDATA[A crash on the Bay Bridge triggered massive gridlock for the Thursday morning commute.]]></media:description>
      <media:videoId>11463404</media:videoId>
      <media:thumbnail url="https://cdn.abcotvs.com/dip/images/11463261_011322-kgo-sky7-bay-bridge-traffic-img.jpg" width="1280" height="720" />
      <enclosure url="https://vcl.abcotv.net/video/kgo/011322-kgo-6am-bay-bridge-crash-vid.mp4" length="79" type="video/mp4" />
      <media:keywords><![CDATA[Bay Bridge crash, traffic, accident, SF commute, Oakland drive times, bay bridge toll plaza backup, Bay Area, treasure island,]]></media:keywords>
      <category><![CDATA[Traffic,Treasure Island,Oakland,San Francisco,CHP,bay bridge,crash]]></category>
      <guid isPermaLink="false">traffic/massive-backup-on-bay-bridge-after-early-morning-crash/11463119/</guid>
    </item>
  </channel>
</rss>

下面是将每个项放入对象($posts)中的解析代码:

代码语言:javascript
复制
    $rss = [xml] (Get-Content 'I:\RSS_Project\Feeds\feed-3.xml')
    $rss.SelectNodes('//item')|%{
    $posts += New-Object psobject -Property @{
        Title = If($_.Title."#cdata-section"){$_.Title."#cdata-section"}else{$_.Title}
        Desc = If($_.description."#cdata-section"){$_.description."#cdata-section"}else{$_.Title}
        link = If($_.link."#cdata-section"){$_.link."#cdata-section"}else{$_.link}
        pubDate = If($_.pubDate."#cdata-section"){$_.pubDate."#cdata-section"}else{$_.pubDate}
        
        }
    }

我通过这个提要获得了正确的链接和pubDate,但是因为在某些项目中有一个media:title和media:description (是的,在同一个提要中不一致),所以我将{title,media:title}输出到我创建的$posts.title自定义对象中。

有了这些数据,它将是{在凌晨崩溃后的海湾桥上的大规模备份,崩溃将触发海湾桥上的大规模备份}。我想不出如何避免捕捉媒体:标题数据。我的其他XML提要没有媒体:title。

如果它存在于任何提要中,我是否可以先发制人并提前删除它?我尝试使用$_.Title,它在这个提要上工作,但是由于其他提要没有数组,所以它不能在这些提要上工作。我有同样的问题,媒体:描述存在于项目中。我将数据输出到一个HTML表中,当我有标题或描述数组时,该表只列出"System.Object“。任何帮助消除媒体:标题进入我的对象将不胜感激。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-13 17:19:22

PowerShell的XML类型适配器可能有点“摇摇欲坠”(因为缺少一个更好的技术术语),因为它试图简化一些复杂的东西--因此,它只是忽略名称空间前缀并以其本地名称解析节点,从而导致$_.title同时解析<title><media:title>元素。

相反,也可以使用XPath解析这些值:

代码语言:javascript
复制
$fields = 'title','description','pubDate','link'

$posts = foreach($item in $rss.SelectNodes('//item')) {
    # create dictionary to hold properties of the object we want to construct
    $properties = [ordered]@{}

    # now let's try to resolve them all
    foreach($fieldName in $fields) {
        # use a relative XPath expression to extract relevant child node from current item
        $value = $item.SelectSingleNode("./${fieldName}")

        # handle content wrapped in CData
        if($value.HasChildNodes -and $value.ChildNodes[0] -is [System.Xml.XmlCDataSection]){
            $value = $value.ChildNodes[0]
        }

        # add node value to dictionary
        $properties[$fieldName] = $value.InnerText
    }

    # output resulting object
    [pscustomobject]$properties
}
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70699775

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档