我最近问了这个问题,但意识到我没有很清楚地解释它。我有一个由发票组成的大型.csv文件(8000+行),每张发票有多行。我将其解析为XML结构,如下所示(简化)。
输入1- $XMLInput
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</roow>输入2- $maxBatchSize描述:在大于此大小(常量)后,中断到下一批处理。
输入3- $listOfInvoices描述:文件中唯一发票号的重复变量。示例:
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
</root>为了提高性能时间,我需要将这些元素按invoiceNumber分组,每个批处理不大于X节点(变量将被导入)。从那里开始,我将把每个批处理并行地发送到一个子处理器,而不是一次处理整个原始文档。例如,在上面的示例XML文档中,如果批处理大小不能大于3,则需要以下XML输出:
输出1- $XMLOutput
<root>
<batch>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
</batch>
<batch>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</batch>
</root>这是一项要求,所有行的发票是在同一批发送。我最初的XSLT尝试低于(2.0),我试图模拟一个while循环,通过递归调用模板将发票组追加到当前节点。当达到最大批处理大小时,我递归地调用批处理模板来创建新批处理。我在每个递归调用之间传递发票和批处理计数器。
编辑:多亏了肯的帮助,我越来越接近了。我确实需要按每次的行数,而不是不同的发票数量来开发票。理论上,如果下面的内容有效,我不确定如何确保前面的兄弟节点中不存在发票号。
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:variable name="batch-size" select="40" as="xs:integer"/>
<xsl:variable name="input" select="bpws:getVariableData('sortedInvoicesByBU')"/>
<xsl:key name="invoice-lines-by-invoice-number" match="row" use="invoiceNumber4z"/>
<xsl:template match="/">
<xsl:element name="batches">
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group select="$input/*:UPSData/*:row" group-by="(position() - 1) idiv $batch-size">
<xsl:for-each select="distinct-values($input/*:UPSData/*:row/*:invoiceNumber4z)[not(.=preceding-sibling::item)]">
<xsl:element name="UPSData">
<xsl:for-each select="current()">
<xsl:for-each select="key('invoice-lines-by-invoice-number',.,$input)">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:for-each-group>
</xsl:element>
</xsl:template>
</xsl:stylesheet>发布于 2013-08-23 01:29:37
我告诉我的学生们,人们可以尽可能多地折磨样式表,以最终使其正常工作,但这并不能使它具有可维护性,甚至也不能使其成为正确的工作方式。我希望您能够接受这样的分析,即您将XSLT作为一种命令式编程语言来对待,这对语言没有任何意义,只会让您相信尝试用C和Java做一些更容易的事情是困难的、冗长的和笨拙的。
但是,如果您以设计的方式使用XSLT,它将变得比命令式语言容易得多,并且要引导它,它都是基于XML的,在其中显示您想要的结果。因为它更短,所以更容易维护。当您理解所使用的声明性指令时,您不必尝试解开命令式算法。XSLT处理器可以优化声明性方法,而如果它遵循书面命令式方法而没有优化它,那么它就不得不缓慢工作。
在下面的解决方案中,这将准确地生成Output1结果,请注意我是如何确定唯一的发票号的,然后用有效的发票号对其进行筛选。然后,根据批处理大小(这是一个参数)对这些数据进行批处理。没有所谓的模板,没有任何类型的计数器.使用XSLT2.0内置工具的解决方案。
不包括全局参数、变量和注释的声明,它只有5个元素:<root>、<xsl:for-each-group>、<batch>、<xsl:for-each>和<xsl:copy-of>。
至于你的问题为什么你的不管用,我不知道.您所采用的方法并不像XSLT .它感觉像是某种编程命令式方法的XSLT表达式。
t:\ftemp>type numbers.xml
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
</root>
t:\ftemp>type invoices.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</root>
t:\ftemp>call xslt2 invoices.xml invoices.xsl
<?xml version="1.0" encoding="UTF-8"?>
<root>
<batch>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
</batch>
<batch>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</batch>
</root>
t:\ftemp>type invoices.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:param name="batch-size" select="2"/>
<xsl:variable name="valid-numbers"
select="doc('numbers.xml')/root/row/invoiceNumber"/>
<xsl:template match="/">
<xsl:variable name="invoiceLines" select="root/row"/>
<root>
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group group-by="(position() - 1) idiv $batch-size"
select="distinct-values($invoiceLines/invoiceNumber)[.=$valid-numbers]">
<!--create a batch using all invoice lines for all numbers in group-->
<batch>
<xsl:for-each select="$invoiceLines[invoiceNumber=current-group()]">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</batch>
</xsl:for-each-group>
</root>
</xsl:template>
</xsl:stylesheet>
t:\ftemp>rem Done! 我正在编辑这个答案,以添加下面的选项,因为您声明您有800万条输入记录,我认为使用键查找表会比我的简单变量谓词执行得更好。它产生相同的结果,在模板中添加一个XSLT指令(可以不添加它就可以完成,但我觉得这更容易读),并删除一个不再需要的变量。
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:param name="batch-size" select="2"/>
<xsl:variable name="valid-numbers"
select="doc('numbers.xml')/root/row/invoiceNumber"/>
<xsl:key name="invoice-lines-by-invoice-number"
match="row" use="invoiceNumber"/>
<xsl:variable name="input" select="/"/>
<xsl:template match="/">
<root>
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group group-by="(position() - 1) idiv $batch-size"
select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]">
<!--create a batch using all invoice lines for all numbers in group-->
<batch>
<xsl:for-each select="current-group()">
<xsl:for-each
select="key('invoice-lines-by-invoice-number',.,$input)">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:for-each>
</batch>
</xsl:for-each-group>
</root>
</xsl:template>
</xsl:stylesheet>发布于 2013-09-02 01:27:57
请不要将此标记为答案,因为我先前的回答回答了原来的问题。
下面的代码回答了一个辅助性问题,即如何按发票的总行数进行批次,而不会在两批之间破开发票。
我想不出一种以声明方式实现的方法,因此下面的答案是一个命令式递归解决方案,但是编写的这样一个实现尾递归的XSLT处理器不会占用堆栈空间。我还利用了本地XSLT特性(关键表和序列),这些特性在其他语言中很难模仿。
代码很紧,只有一个部分实际上写了一批发票.没有更多批处理编写代码块了。我对这件事的结果很满意。
我欢迎任何关于改进或张贴比这更紧的替代解决方案的建议。
t:\ftemp>type numbers.xml
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
<row>
<invoiceNumber>4</invoiceNumber>
</row>
<row>
<invoiceNumber>5</invoiceNumber>
</row>
</root>
t:\ftemp>type invoices.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-1</invoiceText>
<position>7</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-2</invoiceText>
<position>8</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-3</invoiceText>
<position>9</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-4</invoiceText>
<position>10</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-5</invoiceText>
<position>11</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-6</invoiceText>
<position>12</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-1</invoiceText>
<position>13</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-2</invoiceText>
<position>14</position>
...
</row>
</root>
t:\ftemp>call xslt2 invoices.xml invoices.xsl
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--Batch max lines: 5-->
<batch>
<!--invoice numbers: 1 2-->
<!--total line count: 4-->
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 3-->
<!--total line count: 2-->
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 4-->
<!--total line count: 6-->
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-1</invoiceText>
<position>7</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-2</invoiceText>
<position>8</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-3</invoiceText>
<position>9</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-4</invoiceText>
<position>10</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-5</invoiceText>
<position>11</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-6</invoiceText>
<position>12</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 5-->
<!--total line count: 2-->
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-1</invoiceText>
<position>13</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-2</invoiceText>
<position>14</position>
...
</row>
</batch>
</root>
t:\ftemp>type invoices.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:param name="batch-size" select="5"/>
<xsl:variable name="valid-numbers"
select="doc('numbers.xml')/root/row/invoiceNumber"/>
<xsl:key name="invoice-lines-by-invoice-number"
match="row" use="invoiceNumber"/>
<xsl:variable name="input" select="/"/>
<xsl:template match="/">
<root>
<xsl:text>
 </xsl:text>
<xsl:comment select="'Batch max lines:',$batch-size"/>
<xsl:text>
 </xsl:text>
<xsl:call-template name="next-batch">
<xsl:with-param name="remaining-numbers"
select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]"/>
</xsl:call-template>
</root>
</xsl:template>
<xsl:template name="next-batch">
<xsl:param name="this-batch-lines" select="0"/>
<xsl:param name="this-batch-numbers" select="()"/>
<xsl:param name="remaining-numbers" required="yes"/>
<xsl:variable name="this-invoice" select="$remaining-numbers[1]"/>
<xsl:variable name="this-invoice-lines"
select="count(key('invoice-lines-by-invoice-number',$this-invoice,$input))"/>
<xsl:choose>
<xsl:when test="not($this-invoice) and not($this-batch-lines)">
<!--nothing to clean up and nothing more to do-->
</xsl:when>
<xsl:when test="not($this-invoice) (:last invoice complete:) or
( $this-batch-lines + $this-invoice-lines > $batch-size )
(:this invoice exceeds limit:)">
<!--clean up previous unfinished batch-->
<batch>
<xsl:text>
 </xsl:text>
<xsl:comment select="'invoice numbers:',$this-batch-numbers"/>
<xsl:text>
 </xsl:text>
<xsl:comment select="'total line count:',$this-batch-lines"/>
<xsl:text>
 </xsl:text>
<xsl:copy-of select="for $num in $this-batch-numbers return
key('invoice-lines-by-invoice-number',$num,$input)"/>
</batch>
<xsl:if test="$this-invoice">
<!--continue with the next batch comprised of this invoice only-->
<xsl:call-template name="next-batch">
<xsl:with-param name="this-batch-lines"
select="$this-invoice-lines"/>
<xsl:with-param name="this-batch-numbers"
select="$this-invoice"/>
<xsl:with-param name="remaining-numbers"
select="$remaining-numbers[position()>1]"/>
</xsl:call-template>
</xsl:if>
<!--the cleaned up batch was the last batch, template recursion ends-->
</xsl:when>
<xsl:otherwise>
<!--a batch limit has not been exceeded; add this invoice to batch-->
<xsl:call-template name="next-batch">
<xsl:with-param name="this-batch-lines"
select="$this-batch-lines + $this-invoice-lines"/>
<xsl:with-param name="this-batch-numbers"
select="($this-batch-numbers,$this-invoice)"/>
<xsl:with-param name="remaining-numbers"
select="$remaining-numbers[position()>1]"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>https://stackoverflow.com/questions/18392000
复制相似问题