文章/答案/技术大牛

发布

社区首页 >问答首页 >在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。

问在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。
EN

Stack Overflow用户

提问于 2015-10-26 07:59:37

回答 1查看 181关注 0票数 0

有关CYK算法XSLT的任何信息，请查看下面的链接：有两个输入xml，如下所示，我必须在xslt中传递sentance.xml，然后根据每个Rule.xml中的单词，在运行时从Rule.xml文件中读取值，然后生成下面给出的新XML。仅使用XSLT、XPath和XML，不使用任何其他语言或关键字。

algorithm

1) sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<sentances>
  <s>dog bark</s>
  <s>cat drink milk</s>
</sentances>

1) sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<allrules>
<rules>
    <rule cat="s">
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vi"/>
    </rule> 
</rules>
<words>
    <word cat="vi">bark</word>
    <word cat="vt">drink</word>
    <word cat="pn">dog</word>
    <word cat="pn">cat</word>
    <word cat="pn">milk</word>
</words>
</allrules>

OutPut XML应该如下所示：

<trees>
<tree>
    <sentace>dog bark</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">dog</word>
        </node>
        <node cat="vp">
            <word cat="vi">bark</word>
        </node>
    </node>
</tree>
<tree>
    <sentace>cat drink milk</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">cat</word>
        </node>
        <node cat="vp">
            <word cat="vt">drink</word>
            <node cat="np">
                <word cat="pn">milk</word>
            </node>
        </node>
    </node>
</tree>

是否有可能实现CYK算法并使用XSLT生成上述输出

xml

xslt

xpath

回答 1

Stack Overflow用户

发布于 2015-10-29 05:34:27

这里有一个解决方案应该接近你所要求的。您没有指定XSLT版本。我已经在样式表中嵌入了规则和符号，但是您可以很容易地调整它们，使它们成为外部文档。

如果XSLT3.0对您不可用，则可以用尾尾递归替换折叠左()。

这个输入文档.

<sentences>
  <sentence>dog bark</sentence>
  <sentence>cat drink milk</sentence>
</sentences>

...当输入到XSLT3.0样式表时.

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<!--
  The n'th analysis contains the CYK analysis for symbol sequences of length n.
  Let their be s symbols in the sentence.
  analysis[1] has s children.
  analysis[s] has one child.
  analysis[n] has s - n + 1 children
  The children of analysis are node and only node.
  node element represents a node in CYK analysis. This can either be a word or a string of symbols.
  The index of the node within its parent analysis corresponds to the start symbol.
    This index is equal to the index of the word within $words, of the starting word.
  node has any number of children, but the only type this can be is permutation.
  permutation represents a possible value for the node content, the competing alternatives
   being all the sibling permutations. Thus if a node has no permutations, there is no
   possiblity of a sequence of the given length being a correct grammar at that position
   in the sentence.
  Each permuation either has as children: 1 word; or 2 nodes.
  The permuations in the first row (analysis[1] are all of the word type.
  Subsequent rows have permutations of any type.
  words and permutations all have an attribute cat, which is the symbol.  
-->

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:variable name="word-count" select="count( $first-row/node)" as="xs:integer" />
    <xsl:sequence select="fold-left( 2 to $word-count, $first-row, function($a, $b) { $a, so:next-analysis(a) })
      [last()]" />
  </tree>
</xsl:template>

</xsl:stylesheet>

...会产生这个输出.

<trees>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">dog</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <word cat="vp">bark</word>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">cat</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <node>
                        <permutation cat="vt">
                           <word cat="vt">drink</word>
                        </permutation>
                     </node>
                     <node>
                        <permutation cat="np">
                           <word cat="np">milk</word>
                        </permutation>
                     </node>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>   
</trees>

请注意，撤离者

我还没测试过这个。

替代方案

如果您对所有的排列都不感兴趣，并且只想要任何(第一个)置换，那么我们可以添加几个模板，除去所有的排列，只有一个。

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:apply-templates select="
       fold-left(
          2 to count( $first-row/node),
          $first-row,
          function($a, $b) { $a, so:next-analysis(a) })
       [last()]" />
  </tree>
</xsl:template>

<xsl:template match="analysis">
  <xsl:apply-templates />
</xsl:template>

<xsl:template match="node[not( fn:empty(*))]">
    <node cat="{permutation[1]/@cat}">
        <xsl:apply-templates select="permutation[1]/*"/>
    </node>
</xsl:template>

<xsl:template match="node[fn:empty(*)]">
    <node xsi:nil="true" />
</xsl:template>

</xsl:stylesheet>

替代输出

输出应该更整洁，看起来像这样.

<trees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">dog</word>
         </node>
         <node cat="vp">
            <word cat="vp">bark</word>
         </node>
      </node>
   </tree>
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">cat</word>
         </node>
         <node cat="vp">
            <node cat="vt">
               <word cat="vt">drink</word>
            </node>
            <node cat="np">
               <word cat="np">milk</word>
            </node>
         </node>
      </node>
   </tree>   
</trees>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/33340967

复制

相似问题

问在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。
EN