首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用LibXML解析xml将截断数据

使用LibXML解析xml将截断数据
EN

Stack Overflow用户
提问于 2013-03-08 19:24:50
回答 1查看 180关注 0票数 0

My Goal:我想获取subject xml文档中名为“节”的每个元素;获取每个部分,以及它下面的所有内容。

Constraint:我必须使用LibXML Ruby;也就是说,需要‘LibXML’。

Problem:输出数据被截断。

问题(见输出file1.xml)

  • 为什么file1.xml中的输出被截断?注:第一个P(A)之间的大部分文字./P标签(注意:截断以“道德.”一词开始)
  • 为什么代码删除了最后两个P元素(P(B).,P(2).)建训局的成员呢?是什么导致的?xml version="1.0“encoding="UTF-8"?和部分/在输出的末尾出现?

注意:输出file2.xml有更严重的截断。我把它包括进来以防它澄清任何事情。

以下是代码:

代码语言:javascript
复制
#!/usr/bin/ruby
require "xml"
reader = XML::Reader.file('infile2.xml')
while reader.read
  node = reader.node 
    if node.name == "SECTION"
      iteration = XML::Document.string(node.to_s)
      puts iteration
      puts "\n"
    end
end

输入file1.xml:

代码语言:javascript
复制
<?xml version="1.0"?>
<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
  <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
  <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
  <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>

输出,给定输入file1.xml (上面):

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethic</P></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>

输入file2.xml:

代码语言:javascript
复制
<?xml version="1.0"?>
<SUBPART>
  <HD SOURCE="HED">Subpart A—General Provisions</HD>
  <SECTION>
    <SECTNO>§ 0.735-1</SECTNO>
    <SUBJECT>Agency ethics officials.</SUBJECT>
    <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
    <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
    <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
    <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
  </SECTION>
  <SECTION>
    <SECTNO>§ 0.735-2</SECTNO>
    <SUBJECT>Government-wide standards.</SUBJECT>
    <P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
    <CITA>[61 FR 11309, Mar. 20, 1996. Redesignated at 63 FR 33579, June 19, 1998]</CITA>
  </SECTION>
</SUBPART>

输出,给定输入file2.xml (上面):

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
    <SECTNO>§ 0.735-1</SECTNO>
    <SUBJECT>Agency ethics officials.</SUBJECT>
    <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E></P></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
    <SECTNO>§ 0.735-2</SECTNO>
    <SUBJECT>Government-wide standards.</SUBJECT>
    <P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
    <CITA/></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>
EN

回答 1

Stack Overflow用户

发布于 2013-03-08 20:38:47

除非您有一个庞大的XML文档,否则请考虑如下所示:

代码语言:javascript
复制
require "xml"
doc = XML::Document.file('infile1.xml')
doc.find('/SECTION').each do |s|
  puts "[#{s}]"
end

这一产出如下:

代码语言:javascript
复制
<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
  <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
  <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
  <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>

这并不能回答问题,相反,这是一个解决办法。

我不知道使用读取器的实际问题是什么,但我怀疑它与游标无关。例如,下面的工作是,对于第一个XML文档,仍然有一个额外的空部分:

代码语言:javascript
复制
if node.name == "SECTION"
  puts "#{reader.read_outer_xml}"
end
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15301717

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档