我正在尝试使用DocumentBuilder并解析一个包含多个文档的大文件。当我运行我的程序时,我得到这个错误:"The markup in the document following the root element must be well-formed."
我认为这是因为我的文档中没有实际的根,而它是一个TextEdit文档,结构如下:
<DOC>
<DOCNO> AP890106-0001 </DOCNO>
<FILEID>AP-NR-01-06-89 0033EST</FILEID>
<FIRST>r a PM-BRF--Heidnik 01-06 0136</FIRST>
<SECOND>PM-BRF--Heidnik,0139</SECOND>
<HEAD>Torture-Murderer In Fair Condition, Conscious</HEAD>
<DATELINE>PITTSBURGH (AP) </DATELINE>
<TEXT>
Convicted torture-murderer Gary Heidnik has
regained consciousness after apparently attempting suicide in his
prison cell with a drug overdose, prison officials said.
Heidnik's condition was upgraded to fair Thursday, but he
remained under tight security in the intensive care unit of West
Penn Hospital, said Tom Seiverling, a spokesman for the State
Correctional Institution at Pittsburgh.
Heidnik, 45, was semi-comatose earlier this week after being
found unconscious in his cell Sunday. Prison officials believe
Heidnik stored up medications that were prescribed for him by
pretending to take them at the designated times.
The self-proclaimed minister faces the death sentence for the
slayings of two of six women he kept chained in the basement of his
Philadelphia row house. He was convicted and sentenced last July.
</TEXT>
</DOC>
<DOC>
<DOCNO> AP890106-0002 </DOCNO>
<FILEID>AP-NR-01-06-89 0524EST</FILEID>
<FIRST>d a PM-BRF--DrivingToddler 01-06 0162</FIRST>
<SECOND>PM-BRF--Driving Toddler,0166</SECOND>
<HEAD>3-Year-Old Takes Careening First Drive; Emerges Unharmed</HEAD>
<DATELINE>CAZENOVIA, N.Y. (AP) </DATELINE>
<TEXT>
Going out to buy a puppy, Cecilia Kaler
placed her three-year-old son in a child seat, left the car running
and got out to clear snow from the windshield. She never finished
the job.
As soon as his mother closed the door, little Michael Kaler
locked it, put the car in drive, and rode away Wednesday. The car
went down the driveway, across a busy road, narrowly missed a tree
and fire hydrant, rolled on its side down an embankment and finally
came to rest in a creek.
Michael was wet, cold and otherwise unharmed, said Kaler, a
resident of this community 15 miles southeast of Syracuse.
A nearby man heard Kaler screaming and rushed over. He smashed a
window and freed little Michael.
``Anybody who says there's no God doesn't know what they're
talking about, because someone certainly was looking out for him,''
Kaler said Thursday.
</TEXT>
</DOC>我想用tagNames <DOC>和</DOC>将每个文档分开
到目前为止我的代码如下:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(document);
doc.getElementsByTagName("doc").toString();发布于 2020-02-11 04:17:05
解析文件是不可能的,因为没有“唯一”的根元素。你的
<doc> </doc> 块必须用另一个标记容器包围:选择您喜欢的名称。然后,当xml格式良好时,您可以尝试解析。
示例:
<mytag>
<doc> ........</doc>
<doc>........... </doc>
</mytag>https://stackoverflow.com/questions/60157575
复制相似问题