首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >XSLT将大型单亲节点拆分,分组为较小的子节点。

XSLT将大型单亲节点拆分,分组为较小的子节点。
EN

Stack Overflow用户
提问于 2013-08-22 22:46:23
回答 2查看 1.9K关注 0票数 2

我最近问了这个问题,但意识到我没有很清楚地解释它。我有一个由发票组成的大型.csv文件(8000+行),每张发票有多行。我将其解析为XML结构,如下所示(简化)。

输入1- $XMLInput

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-1</invoiceText>
        <position>1<position>
        ...
    </row>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-2</invoiceText>
        <position>2<position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-1</invoiceText>
        <position>3<position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-2</invoiceText>
        <position>4<position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-1</invoiceText>
        <position>5<position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-2</invoiceText>
        <position>6<position>
        ...
    </row>
</roow>

输入2- $maxBatchSize描述:在大于此大小(常量)后,中断到下一批处理。

输入3- $listOfInvoices描述:文件中唯一发票号的重复变量。示例:

代码语言:javascript
复制
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
    </row>
</root>

为了提高性能时间,我需要将这些元素按invoiceNumber分组,每个批处理不大于X节点(变量将被导入)。从那里开始,我将把每个批处理并行地发送到一个子处理器,而不是一次处理整个原始文档。例如,在上面的示例XML文档中,如果批处理大小不能大于3,则需要以下XML输出:

输出1- $XMLOutput

代码语言:javascript
复制
<root>
    <batch>
        <row>
            <invoiceNumber>1</invoiceNumber>
            <invoiceText>invoice 1-1</invoiceText>
            <position>1<position>
            ...
        </row>
        <row>
            <invoiceNumber>1</invoiceNumber>
            <invoiceText>invoice 1-2</invoiceText>
            <position>2<position>
            ...
        </row>
        <row>
            <invoiceNumber>2</invoiceNumber>
            <invoiceText>invoice 2-1</invoiceText>
            <position>3<position>
            ...
        </row>
        <row>
            <invoiceNumber>2</invoiceNumber>
            <invoiceText>invoice 2-2</invoiceText>
            <position>4<position>
            ...
        </row>
    </batch>
    <batch>
        <row>
            <invoiceNumber>3</invoiceNumber>
            <invoiceText>invoice 3-1</invoiceText>
            <position>5<position>
            ...
        </row>
        <row>
            <invoiceNumber>3</invoiceNumber>
            <invoiceText>invoice 3-2</invoiceText>
            <position>6<position>
            ...
        </row>
    </batch>
</root>

这是一项要求,所有行的发票是在同一批发送。我最初的XSLT尝试低于(2.0),我试图模拟一个while循环,通过递归调用模板将发票组追加到当前节点。当达到最大批处理大小时,我递归地调用批处理模板来创建新批处理。我在每个递归调用之间传递发票和批处理计数器。

编辑:多亏了肯的帮助,我越来越接近了。我确实需要按每次的行数,而不是不同的发票数量来开发票。理论上,如果下面的内容有效,我不确定如何确保前面的兄弟节点中不存在发票号。

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <xsl:variable name="batch-size" select="40" as="xs:integer"/>
<xsl:variable name="input" select="bpws:getVariableData('sortedInvoicesByBU')"/>
<xsl:key name="invoice-lines-by-invoice-number" match="row" use="invoiceNumber4z"/>

<xsl:template match="/">
    <xsl:element name="batches">
        <!--establish batches from possible non-contiguous invoice numbers-->
        <xsl:for-each-group select="$input/*:UPSData/*:row" group-by="(position() - 1) idiv $batch-size">
            <xsl:for-each select="distinct-values($input/*:UPSData/*:row/*:invoiceNumber4z)[not(.=preceding-sibling::item)]">
                <xsl:element name="UPSData">
                    <xsl:for-each select="current()">
                        <xsl:for-each select="key('invoice-lines-by-invoice-number',.,$input)">
                            <!--copy rows as they are-->
                            <xsl:copy-of select="."/>
                        </xsl:for-each>
                    </xsl:for-each>
                </xsl:element>
            </xsl:for-each>
        </xsl:for-each-group>
    </xsl:element>
</xsl:template>
</xsl:stylesheet>
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-08-23 01:29:37

我告诉我的学生们,人们可以尽可能多地折磨样式表,以最终使其正常工作,但这并不能使它具有可维护性,甚至也不能使其成为正确的工作方式。我希望您能够接受这样的分析,即您将XSLT作为一种命令式编程语言来对待,这对语言没有任何意义,只会让您相信尝试用C和Java做一些更容易的事情是困难的、冗长的和笨拙的。

但是,如果您以设计的方式使用XSLT,它将变得比命令式语言容易得多,并且要引导它,它都是基于XML的,在其中显示您想要的结果。因为它更短,所以更容易维护。当您理解所使用的声明性指令时,您不必尝试解开命令式算法。XSLT处理器可以优化声明性方法,而如果它遵循书面命令式方法而没有优化它,那么它就不得不缓慢工作。

在下面的解决方案中,这将准确地生成Output1结果,请注意我是如何确定唯一的发票号的,然后用有效的发票号对其进行筛选。然后,根据批处理大小(这是一个参数)对这些数据进行批处理。没有所谓的模板,没有任何类型的计数器.使用XSLT2.0内置工具的解决方案。

不包括全局参数、变量和注释的声明,它只有5个元素:<root><xsl:for-each-group><batch><xsl:for-each><xsl:copy-of>

至于你的问题为什么你的不管用,我不知道.您所采用的方法并不像XSLT .它感觉像是某种编程命令式方法的XSLT表达式。

代码语言:javascript
复制
t:\ftemp>type numbers.xml 
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
    </row>
</root>

t:\ftemp>type invoices.xml 
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-1</invoiceText>
        <position>1</position>
        ...
    </row>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-2</invoiceText>
        <position>2</position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-1</invoiceText>
        <position>3</position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-2</invoiceText>
        <position>4</position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-1</invoiceText>
        <position>5</position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-2</invoiceText>
        <position>6</position>
        ...
    </row>
</root>

t:\ftemp>call xslt2 invoices.xml invoices.xsl 
<?xml version="1.0" encoding="UTF-8"?>
<root>
   <batch>
      <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-1</invoiceText>
        <position>1</position>
        ...
    </row>
      <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-2</invoiceText>
        <position>2</position>
        ...
    </row>
      <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-1</invoiceText>
        <position>3</position>
        ...
    </row>
      <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-2</invoiceText>
        <position>4</position>
        ...
    </row>
   </batch>
   <batch>
      <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-1</invoiceText>
        <position>5</position>
        ...
    </row>
      <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-2</invoiceText>
        <position>6</position>
        ...
    </row>
   </batch>
</root>

t:\ftemp>type invoices.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output indent="yes"/>

<xsl:param name="batch-size" select="2"/>

<xsl:variable name="valid-numbers"
              select="doc('numbers.xml')/root/row/invoiceNumber"/>

<xsl:template match="/">
  <xsl:variable name="invoiceLines" select="root/row"/>
  <root>
    <!--establish batches from possible non-contiguous invoice numbers-->
    <xsl:for-each-group  group-by="(position() - 1) idiv $batch-size" 
      select="distinct-values($invoiceLines/invoiceNumber)[.=$valid-numbers]">
      <!--create a batch using all invoice lines for all numbers in group-->
      <batch>
        <xsl:for-each select="$invoiceLines[invoiceNumber=current-group()]">
          <!--copy rows as they are-->
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </batch>
    </xsl:for-each-group>
  </root>
</xsl:template>

</xsl:stylesheet>
t:\ftemp>rem Done! 

我正在编辑这个答案,以添加下面的选项,因为您声明您有800万条输入记录,我认为使用键查找表会比我的简单变量谓词执行得更好。它产生相同的结果,在模板中添加一个XSLT指令(可以不添加它就可以完成,但我觉得这更容易读),并删除一个不再需要的变量。

代码语言:javascript
复制
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output indent="yes"/>

<xsl:param name="batch-size" select="2"/>

<xsl:variable name="valid-numbers"
              select="doc('numbers.xml')/root/row/invoiceNumber"/>

<xsl:key name="invoice-lines-by-invoice-number"
         match="row" use="invoiceNumber"/>

<xsl:variable name="input" select="/"/>

<xsl:template match="/">
  <root>
    <!--establish batches from possible non-contiguous invoice numbers-->
    <xsl:for-each-group  group-by="(position() - 1) idiv $batch-size" 
      select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]">
      <!--create a batch using all invoice lines for all numbers in group-->
      <batch>
        <xsl:for-each select="current-group()">
          <xsl:for-each
                     select="key('invoice-lines-by-invoice-number',.,$input)">
            <!--copy rows as they are-->
            <xsl:copy-of select="."/>
          </xsl:for-each>
        </xsl:for-each>
      </batch>
    </xsl:for-each-group>
  </root>
</xsl:template>

</xsl:stylesheet>
票数 4
EN

Stack Overflow用户

发布于 2013-09-02 01:27:57

请不要将此标记为答案,因为我先前的回答回答了原来的问题。

下面的代码回答了一个辅助性问题,即如何按发票的总行数进行批次,而不会在两批之间破开发票。

我想不出一种以声明方式实现的方法,因此下面的答案是一个命令式递归解决方案,但是编写的这样一个实现尾递归的XSLT处理器不会占用堆栈空间。我还利用了本地XSLT特性(关键表和序列),这些特性在其他语言中很难模仿。

代码很紧,只有一个部分实际上写了一批发票.没有更多批处理编写代码块了。我对这件事的结果很满意。

我欢迎任何关于改进或张贴比这更紧的替代解决方案的建议。

代码语言:javascript
复制
t:\ftemp>type numbers.xml 
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
    </row>
    <row>
        <invoiceNumber>5</invoiceNumber>
    </row>
</root>

t:\ftemp>type invoices.xml 
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-1</invoiceText>
        <position>1</position>
        ...
    </row>
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-2</invoiceText>
        <position>2</position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-1</invoiceText>
        <position>3</position>
        ...
    </row>
    <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-2</invoiceText>
        <position>4</position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-1</invoiceText>
        <position>5</position>
        ...
    </row>
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-2</invoiceText>
        <position>6</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-1</invoiceText>
        <position>7</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-2</invoiceText>
        <position>8</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-3</invoiceText>
        <position>9</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-4</invoiceText>
        <position>10</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-5</invoiceText>
        <position>11</position>
        ...
    </row>
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-6</invoiceText>
        <position>12</position>
        ...
    </row>
    <row>
        <invoiceNumber>5</invoiceNumber>
        <invoiceText>invoice 5-1</invoiceText>
        <position>13</position>
        ...
    </row>
    <row>
        <invoiceNumber>5</invoiceNumber>
        <invoiceText>invoice 5-2</invoiceText>
        <position>14</position>
        ...
    </row>
</root>

t:\ftemp>call xslt2 invoices.xml invoices.xsl 
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <!--Batch max lines: 5-->
  <batch>
    <!--invoice numbers: 1 2-->
    <!--total line count: 4-->
    <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-1</invoiceText>
        <position>1</position>
        ...
    </row>
      <row>
        <invoiceNumber>1</invoiceNumber>
        <invoiceText>invoice 1-2</invoiceText>
        <position>2</position>
        ...
    </row>
      <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-1</invoiceText>
        <position>3</position>
        ...
    </row>
      <row>
        <invoiceNumber>2</invoiceNumber>
        <invoiceText>invoice 2-2</invoiceText>
        <position>4</position>
        ...
    </row>
   </batch>
   <batch>
    <!--invoice numbers: 3-->
    <!--total line count: 2-->
    <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-1</invoiceText>
        <position>5</position>
        ...
    </row>
      <row>
        <invoiceNumber>3</invoiceNumber>
        <invoiceText>invoice 3-2</invoiceText>
        <position>6</position>
        ...
    </row>
   </batch>
   <batch>
    <!--invoice numbers: 4-->
    <!--total line count: 6-->
    <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-1</invoiceText>
        <position>7</position>
        ...
    </row>
      <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-2</invoiceText>
        <position>8</position>
        ...
    </row>
      <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-3</invoiceText>
        <position>9</position>
        ...
    </row>
      <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-4</invoiceText>
        <position>10</position>
        ...
    </row>
      <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-5</invoiceText>
        <position>11</position>
        ...
    </row>
      <row>
        <invoiceNumber>4</invoiceNumber>
        <invoiceText>invoice 4-6</invoiceText>
        <position>12</position>
        ...
    </row>
   </batch>
   <batch>
    <!--invoice numbers: 5-->
    <!--total line count: 2-->
    <row>
        <invoiceNumber>5</invoiceNumber>
        <invoiceText>invoice 5-1</invoiceText>
        <position>13</position>
        ...
    </row>
      <row>
        <invoiceNumber>5</invoiceNumber>
        <invoiceText>invoice 5-2</invoiceText>
        <position>14</position>
        ...
    </row>
   </batch>
</root>

t:\ftemp>type invoices.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

<xsl:output indent="yes"/>

<xsl:param name="batch-size" select="5"/>

<xsl:variable name="valid-numbers"
              select="doc('numbers.xml')/root/row/invoiceNumber"/>

<xsl:key name="invoice-lines-by-invoice-number"
         match="row" use="invoiceNumber"/>

<xsl:variable name="input" select="/"/>

<xsl:template match="/">
  <root>
    <xsl:text>&#xa;  </xsl:text>
    <xsl:comment select="'Batch max lines:',$batch-size"/>
    <xsl:text>&#xa;  </xsl:text>
    <xsl:call-template name="next-batch">
      <xsl:with-param name="remaining-numbers" 
        select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]"/>
    </xsl:call-template>
  </root>
</xsl:template>

<xsl:template name="next-batch">
  <xsl:param name="this-batch-lines" select="0"/>
  <xsl:param name="this-batch-numbers" select="()"/>
  <xsl:param name="remaining-numbers" required="yes"/>
  <xsl:variable name="this-invoice" select="$remaining-numbers[1]"/>
  <xsl:variable name="this-invoice-lines"
  select="count(key('invoice-lines-by-invoice-number',$this-invoice,$input))"/>

  <xsl:choose>
    <xsl:when test="not($this-invoice) and not($this-batch-lines)">
      <!--nothing to clean up and nothing more to do-->
    </xsl:when>
    <xsl:when test="not($this-invoice) (:last invoice complete:) or
                    ( $this-batch-lines + $this-invoice-lines > $batch-size )
                      (:this invoice exceeds limit:)">
      <!--clean up previous unfinished batch-->
      <batch>
        <xsl:text>&#xa;    </xsl:text>
        <xsl:comment select="'invoice numbers:',$this-batch-numbers"/>
        <xsl:text>&#xa;    </xsl:text>
        <xsl:comment select="'total line count:',$this-batch-lines"/>
        <xsl:text>&#xa;    </xsl:text>
        <xsl:copy-of select="for $num in $this-batch-numbers return
                         key('invoice-lines-by-invoice-number',$num,$input)"/>
      </batch>
      <xsl:if test="$this-invoice">
        <!--continue with the next batch comprised of this invoice only-->
        <xsl:call-template name="next-batch">
          <xsl:with-param name="this-batch-lines"
                          select="$this-invoice-lines"/>
          <xsl:with-param name="this-batch-numbers"
                          select="$this-invoice"/>
          <xsl:with-param name="remaining-numbers" 
                          select="$remaining-numbers[position()>1]"/>
        </xsl:call-template>
      </xsl:if>
      <!--the cleaned up batch was the last batch, template recursion ends-->
    </xsl:when>
    <xsl:otherwise>
      <!--a batch limit has not been exceeded; add this invoice to batch-->
      <xsl:call-template name="next-batch">
        <xsl:with-param name="this-batch-lines"
                        select="$this-batch-lines + $this-invoice-lines"/>
        <xsl:with-param name="this-batch-numbers"
                        select="($this-batch-numbers,$this-invoice)"/>
        <xsl:with-param name="remaining-numbers"
                          select="$remaining-numbers[position()>1]"/>
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

</xsl:stylesheet>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/18392000

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档