首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。

在XSLT中使用Cocke-Younger Kasami (CYK)算法并转换XML以生成新的XML文件。
EN

Stack Overflow用户
提问于 2015-10-26 07:59:37
回答 1查看 181关注 0票数 0

有关CYK算法XSLT的任何信息,请查看下面的链接: 有两个输入xml,如下所示,我必须在xslt中传递sentance.xml,然后根据每个Rule.xml中的单词,在运行时从Rule.xml文件中读取值,然后生成下面给出的新XML。 仅使用XSLT、XPath和XML,不使用任何其他语言或关键字。

algorithm

1) sentance.xml

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<sentances>
  <s>dog bark</s>
  <s>cat drink milk</s>
</sentances>

1) sentance.xml

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<allrules>
<rules>
    <rule cat="s">
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vi"/>
    </rule> 
</rules>
<words>
    <word cat="vi">bark</word>
    <word cat="vt">drink</word>
    <word cat="pn">dog</word>
    <word cat="pn">cat</word>
    <word cat="pn">milk</word>
</words>
</allrules>

OutPut XML应该如下所示:

代码语言:javascript
复制
<trees>
<tree>
    <sentace>dog bark</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">dog</word>
        </node>
        <node cat="vp">
            <word cat="vi">bark</word>
        </node>
    </node>
</tree>
<tree>
    <sentace>cat drink milk</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">cat</word>
        </node>
        <node cat="vp">
            <word cat="vt">drink</word>
            <node cat="np">
                <word cat="pn">milk</word>
            </node>
        </node>
    </node>
</tree>

是否有可能实现CYK算法并使用XSLT生成上述输出

EN

回答 1

Stack Overflow用户

发布于 2015-10-29 05:34:27

这里有一个解决方案应该接近你所要求的。您没有指定XSLT版本。我已经在样式表中嵌入了规则和符号,但是您可以很容易地调整它们,使它们成为外部文档。

如果XSLT3.0对您不可用,则可以用尾尾递归替换折叠左()。

这个输入文档.

代码语言:javascript
复制
<sentences>
  <sentence>dog bark</sentence>
  <sentence>cat drink milk</sentence>
</sentences>

...当输入到XSLT3.0样式表时.

代码语言:javascript
复制
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<!--
  The n'th analysis contains the CYK analysis for symbol sequences of length n.
  Let their be s symbols in the sentence.
  analysis[1] has s children.
  analysis[s] has one child.
  analysis[n] has s - n + 1 children
  The children of analysis are node and only node.
  node element represents a node in CYK analysis. This can either be a word or a string of symbols.
  The index of the node within its parent analysis corresponds to the start symbol.
    This index is equal to the index of the word within $words, of the starting word.
  node has any number of children, but the only type this can be is permutation.
  permutation represents a possible value for the node content, the competing alternatives
   being all the sibling permutations. Thus if a node has no permutations, there is no
   possiblity of a sequence of the given length being a correct grammar at that position
   in the sentence.
  Each permuation either has as children: 1 word; or 2 nodes.
  The permuations in the first row (analysis[1] are all of the word type.
  Subsequent rows have permutations of any type.
  words and permutations all have an attribute cat, which is the symbol.  
-->

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:variable name="word-count" select="count( $first-row/node)" as="xs:integer" />
    <xsl:sequence select="fold-left( 2 to $word-count, $first-row, function($a, $b) { $a, so:next-analysis(a) })
      [last()]" />
  </tree>
</xsl:template>

</xsl:stylesheet>

...会产生这个输出.

代码语言:javascript
复制
<trees>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">dog</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <word cat="vp">bark</word>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">cat</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <node>
                        <permutation cat="vt">
                           <word cat="vt">drink</word>
                        </permutation>
                     </node>
                     <node>
                        <permutation cat="np">
                           <word cat="np">milk</word>
                        </permutation>
                     </node>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>   
</trees>

请注意,撤离者

我还没测试过这个。

替代方案

如果您对所有的排列都不感兴趣,并且只想要任何(第一个)置换,那么我们可以添加几个模板,除去所有的排列,只有一个。

代码语言:javascript
复制
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:apply-templates select="
       fold-left(
          2 to count( $first-row/node),
          $first-row,
          function($a, $b) { $a, so:next-analysis(a) })
       [last()]" />
  </tree>
</xsl:template>

<xsl:template match="analysis">
  <xsl:apply-templates />
</xsl:template>

<xsl:template match="node[not( fn:empty(*))]">
    <node cat="{permutation[1]/@cat}">
        <xsl:apply-templates select="permutation[1]/*"/>
    </node>
</xsl:template>

<xsl:template match="node[fn:empty(*)]">
    <node xsi:nil="true" />
</xsl:template>

</xsl:stylesheet>

替代输出

输出应该更整洁,看起来像这样.

代码语言:javascript
复制
<trees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">dog</word>
         </node>
         <node cat="vp">
            <word cat="vp">bark</word>
         </node>
      </node>
   </tree>
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">cat</word>
         </node>
         <node cat="vp">
            <node cat="vt">
               <word cat="vt">drink</word>
            </node>
            <node cat="np">
               <word cat="np">milk</word>
            </node>
         </node>
      </node>
   </tree>   
</trees>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33340967

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档