My Goal:我想获取subject xml文档中名为“节”的每个元素;获取每个部分,以及它下面的所有内容。
Constraint:我必须使用LibXML Ruby;也就是说,需要‘LibXML’。
Problem:输出数据被截断。
问题(见输出file1.xml)
注意:输出file2.xml有更严重的截断。我把它包括进来以防它澄清任何事情。
以下是代码:
#!/usr/bin/ruby
require "xml"
reader = XML::Reader.file('infile2.xml')
while reader.read
node = reader.node
if node.name == "SECTION"
iteration = XML::Document.string(node.to_s)
puts iteration
puts "\n"
end
end输入file1.xml:
<?xml version="1.0"?>
<SECTION>
<SECTNO>§ 0.735-1</SECTNO>
<SUBJECT>Agency ethics officials.</SUBJECT>
<P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
<P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
<P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
<CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>输出,给定输入file1.xml (上面):
<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
<SECTNO>§ 0.735-1</SECTNO>
<SUBJECT>Agency ethics officials.</SUBJECT>
<P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethic</P></SECTION>
<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>输入file2.xml:
<?xml version="1.0"?>
<SUBPART>
<HD SOURCE="HED">Subpart A—General Provisions</HD>
<SECTION>
<SECTNO>§ 0.735-1</SECTNO>
<SUBJECT>Agency ethics officials.</SUBJECT>
<P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
<P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
<P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
<CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>
<SECTION>
<SECTNO>§ 0.735-2</SECTNO>
<SUBJECT>Government-wide standards.</SUBJECT>
<P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
<CITA>[61 FR 11309, Mar. 20, 1996. Redesignated at 63 FR 33579, June 19, 1998]</CITA>
</SECTION>
</SUBPART>输出,给定输入file2.xml (上面):
<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
<SECTNO>§ 0.735-1</SECTNO>
<SUBJECT>Agency ethics officials.</SUBJECT>
<P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E></P></SECTION>
<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>
<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
<SECTNO>§ 0.735-2</SECTNO>
<SUBJECT>Government-wide standards.</SUBJECT>
<P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
<CITA/></SECTION>
<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>发布于 2013-03-08 20:38:47
除非您有一个庞大的XML文档,否则请考虑如下所示:
require "xml"
doc = XML::Document.file('infile1.xml')
doc.find('/SECTION').each do |s|
puts "[#{s}]"
end这一产出如下:
<SECTION>
<SECTNO>§ 0.735-1</SECTNO>
<SUBJECT>Agency ethics officials.</SUBJECT>
<P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
<P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
<P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
<CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>这并不能回答问题,相反,这是一个解决办法。
我不知道使用读取器的实际问题是什么,但我怀疑它与游标无关。例如,下面的工作是,对于第一个XML文档,仍然有一个额外的空部分:
if node.name == "SECTION"
puts "#{reader.read_outer_xml}"
endhttps://stackoverflow.com/questions/15301717
复制相似问题