这个问题经过了重大的编辑,让事情变得更清晰。
我试图从联邦法规电子代码XML feed (http://www.gpo.gov/fdsys/bulkdata/CFR/2015/title-15/CFR-2015-title15-vol2.xml)中提取数据,但遇到了问题。
具体地说,我想获取将由Node和Attribute组合匹配的数据。在下面的XML片段中,您可以看到我想要抓取的一些文本。我想要获得属性FP-2存在的每个FP节点的数据。我还想获取每个具有属性FP-1的FP节点的数据。
<APPENDIX>
<EAR>Pt. 774, Supp. 1</EAR>
<HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
<HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
<HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
<FP SOURCE="FP-2">
<E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
</FP>
<FP SOURCE="FP-2">
<E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
</FP>
<FP SOURCE="FP-1">
<E T="04">License Requirements</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Reason for Control:</E> NS, AT, UN</FP>
<GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
<BOXHD>
<CHED H="1">Control(s)</CHED>
<CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
</BOXHD>
<ROW>
<ENT I="01">NS applies to entire entry</ENT>
<ENT>NS Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">AT applies to entire entry</ENT>
<ENT>AT Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">UN applies to entire entry</ENT>
<ENT>See § 746.1(b) for UN controls.</ENT>
</ROW>
</GPOTABLE>
<FP SOURCE="FP-1">
<E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">LVS:</E> $3,000 for 0A018.b</FP>
<FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
<FP SOURCE="FP-1">
<E T="03">GBS:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="03">CIV:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="04">List of Items Controlled</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
<FP SOURCE="FP-1">
<E T="03">Related Definitions:</E> N/A</FP>
<FP>
<E T="03">Items:</E> a. [Reserved]</FP>
<P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
<NOTE>
<HD SOURCE="HED">
<E T="03">Note:</E>
</HD>
<P>
<E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
</P>
<P>
<E T="03">a. Ammunition crimped without a projectile (blank star);</E>
</P>
</APPENDIX>
更复杂的是,我试图将这些数据放到Filemaker中,但在进行编辑时,我将坚持使用简单的XSL。
下面的XSL不加区分地获取所有FP节点。
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
修改它以匹配xsl:template match="FP@SOURCE='FP-1‘允许我根据属性进行必要的匹配,但我仍然不清楚如何捕获所需的数据。
发布于 2015-08-08 06:43:58
以下是一些事情:
@.考虑以下XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>
<xsl:template match="/">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<xsl:for-each select="//FP[@SOURCE = 'FP-2']/E[@T='02']">
<ROW>
<COL>
<DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
</COL>
</ROW>
</xsl:for-each>
<xsl:for-each select="//FP[@SOURCE = 'FP-1']/E[@T='02']">
<ROW>
<COL>
<DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
</COL>
</ROW>
</xsl:for-each>
</RESULTSET>
</FMPXMLRESULT>
</xsl:template>
</xsl:stylesheet>这将输出:
<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>和完整web链接xml的部分输出:
<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>2A000</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A521</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A604</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A606</DATA>
</COL>
</ROW>
...实际上,将您的XSLT处理程序指向GPO链接以及所有FP1s和FP2s输出。我刚刚用Python做到了!接近3,000行!
发布于 2015-08-08 08:39:57
你的问题还不清楚。如果我专注于这一部分:
我想要获取存在属性FP-2的每个FP节点的数据。我还想获取每个具有属性FP-1的FP节点的数据。
然后,您可能想要更改此设置:
<xsl:for-each select="//FP">至:
<xsl:for-each select="//FP[@SOURCE='FP-1' or @SOURCE='FP-2']">请注意,这将返回源属性的值为' FP -1‘或'FP-2’的每个FP元素的值。我在您的输入中看不到“存在属性FP-2的FP节点”。
还要注意,就处理能力而言,//语法的开销很大。如果您使用完整的显式路径,您将获得更好的性能。
https://stackoverflow.com/questions/31887091
复制相似问题