我试图在Linux系统上使用xmllint从一个巨大的(超过150万行) xml文档中提取一些特定的数据,但对xmllint语法不太熟悉。我一直在使用grep和awk来做这件事,效率很低,但我发现这个系统有xmllint实用程序(我从未使用过),我想既然xml结构良好,应该有一种方法来直接访问数据。我已经包含了xml文档的一个片段,但是在解析它的过程中,我导致了xmllint出现解析器错误,尽管我认为它是正确的。我想,如果您足够精通xmllint来回答我的问题,那么您可能很容易找出解析器错误。
基于网络搜索,我尝试了以下语法:
cat //*/@index' | xmllint --shell stub.xml (which does return ALL of the "indexes")
and
test=$(xmllint --debug --xpath "//PTC/BPSETS/BPSET/BPS" stub.xml) (which does dump the entire BPS entry)
and
xmllint --xpath "string(//PTC/BPSETS/BPSET/@b95)" stub.xml (returns no values)
Here is the xml snippet as best as I can trim it down:
<?xml version="1.0" encoding="utf-8"?>
<PTC version="2.0" cls="2">
<BPSETS>
<BPSET define="b95">
<BPS define="88lmax">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT red="Y" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="32203506">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="24237243">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="2" atv="1" bf="8136575">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="688" atv="1" bf="1183872">
<AWD cpbt="50" />
</PN>
</PNS>
</BPS>
<BPS define="88l6">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="6" />
<MNBT lmt="6" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="28073582">
<AWD cpbt="150">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="16686973">
<AWD cpbt="150">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l4">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="4" />
<MNBT lmt="4" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="31342257">
<AWD cpbt="50">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="13761775">
<AWD cpbt="50">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l2">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="2" />
<MNBT lmt="2" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="16291759">
<AWD cpbt="10">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="15032283">
<AWD cpbt="10">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l1">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="1" />
<MNBT lmt="1" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="33278739">
<AWD>
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="7261567">
<AWD>
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="896" atv="1" bf="101540">
<AWD cpbt="10" />
</PN>
<PN index="897" atv="1" bf="3680792">
<AWD cpbt="10" />
</PN>
<PN index="898" atv="1" bf="25776896">
<AWD cpbt="10" />
</PN>
</PNS>
</BPS>
</BPSET>
<BPSET define="b94" use="b95">
<BPS define="88mx">
<PNS>
<PN index="422" atv="1" bf="11692089">
<AWD cpbt="9000" />
</PN>
<PN index="424" atv="1" bf="12200338">
<AWD cpbt="7200" />
</PN>
<PN index="427" atv="1" bf="24210225">
<AWD cpbt="6000" />
</PN>
</PNS>
<BPS>
</BPSET>
</BPSETS>
</PTC>
What I really need is a query that returns all the attribute's contained in a specific element under a specific index e.g.:
<!-- language: lang-xml -->
<PTC version="2.0" cls="2">
<PN index="0" atv="1" bf="32203506">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
A query that given a PN index value (e.g. 0) would return the values of bf and cbpt…
If it were an sql query the xmllint query I'm looking for would be something like:
```sql从PTC.BPSETS.BPSET.BPS.PNS.PN中选择bf,cbpt
其中BPS = "b95“,BPS= 88lmax,PN.index = 0。
如果你明白我的意思的话。这里的任何指导都很感谢。谢谢。
发布于 2019-05-22 01:09:58
进一步的研究和实验表明这是所需的语法:
stub.xml 'cat //PTC/BPSETS/BPSET@define="b95"/BPS@define="88lmax"/PNS/PN@index="0"/AWD/@cpbt‘| xmllint --外壳echo
这将产生所需的数据。
https://stackoverflow.com/questions/56224699
复制相似问题