我想将XHTML转换成XML,如下所示,但我不知道如何做到这一点。我希望读取输入的div.cmp-text's数据,并将其添加到XML元素中的属性中。
输入XML:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<div class="cmp-text">
<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong>
<span xmlns="http://www.w3.org/1999/xhtml"
class="data-class">May 19, 2020
</span>
<h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2>
<p xmlns="http://www.w3.org/1999/xhtml">
Lorem ipsum dolor sit amet, consectetur adipisicing.
</p>
</div>
<div class="cmp-horizontal-line">
<hr xmlns="http://www.w3.org/1999/xhtml"/>
</div>
<div class="cmp-text">
<ul xmlns="http://www.w3.org/1999/xhtml">
<li>
Lorem ipsum.
</li>
</ul>
<table xmlns="http://www.w3.org/1999/xhtml"
style="border-collapse: collapse;"
border="1">
<tbody>
<tr>
<td style="width: 33.3333%;">111</td>
<td style="width: 33.3333%;">212</td>
</tr>
</tbody>
</table>
</div>
<div class="cmp-horizontal-line">
<hr xmlns="http://www.w3.org/1999/xhtml"/>
</div>
</result>预期产出:
<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text
type="/text"
text="<strong xmlns='http://www.w3.org/1999/xhtml'>Content</strong><span xmlns='http://www.w3.org/1999/xhtml' class='data-class'>May 19, 2020</span><h2 xmlns='http://www.w3.org/1999/xhtml'>Description</h2><p xmlns='http://www.w3.org/1999/xhtml'>Lorem ipsum dolor sit amet, consectetur adipisicing.</p>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="<ul xmlns='http://www.w3.org/1999/xhtml'><li>Lorem ipsum.</li></ul><table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'><tbody><tr><td style='width: 33.3333%;'>111</td><td style='width: 33.3333%;'>212</td></tr></tbody></table>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>XSL:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0">
<xsl:output version="1.0"
encoding="UTF-8"
indent="yes"
method="xml"
omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<!--root element-->
<xsl:template match="/">
<result>
<xsl:apply-templates/>
</result>
</xsl:template>
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text">value</xsl:attribute>
<xsl:attribute name="text2">
<xsl:value-of select="node()"/>
</xsl:attribute>
<xsl:attribute name="text3">
<xsl:value-of select=".//*"/>
</xsl:attribute>
</text>
</xsl:template>
<!--horizontal line-->
<xsl:template match="/result/div[@class='cmp-horizontal-line']">
<horizontal_line type="/horizontal-line"/>
</xsl:template>
<!--horizontal line-->
<xsl:template match="/result/xhtml:div[@class='cmp-horizontal-line']">
<horizontal_line type="/horizontal-line"/>
</xsl:template>
<!--identity template copies everything forward by default-->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>使用上面的XSL输出XML:
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text type="/text"
text="value"
text2="Last Reviewed:"
text3="Last Reviewed:"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="value"
text2="Criteria"
text3="Criteria"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>在文本元素中,属性文本、text2和text3是我获得节点(HTML)的失败尝试,就像在属性中一样。
如何获得所需的输出?
更新:将所需的输出更新为格式良好的。
解决方案需要在XSLT1.0中,因此不能使用序列化()。
在Martin的评论之后,我使用了LenzConsuling.com/xml-to,并能够通过对XSL脚本进行以下更改获得所需的结果:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text">
<xsl:apply-templates select="./*" mode="xml-to-string"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>它产生了以下XML:
<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<result>
<text
type="/text"
text="<strong xmlns='http://www.w3.org/1999/xhtml'>Content</strong><span xmlns='http://www.w3.org/1999/xhtml' class='data-class'>May 19, 2020</span><h2 xmlns='http://www.w3.org/1999/xhtml'>Description</h2><p xmlns='http://www.w3.org/1999/xhtml'>Lorem ipsum dolor sit amet, consectetur adipisicing.</p>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
<text type="/text"
text="<ul xmlns='http://www.w3.org/1999/xhtml'><li>Lorem ipsum.</li></ul><table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'><tbody><tr><td style='width: 33.3333%;'>111</td><td style='width: 33.3333%;'>212</td></tr></tbody></table>"
textIsRich="true"/>
<horizontal_line type="/horizontal-line"/>
</result>
</result>发布于 2022-09-28 15:51:22
所以XSLT3.0的模板应该是。
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text>
<xsl:attribute name="type">/text</xsl:attribute>
<xsl:attribute name="text" select="serialize(*)"/>
</text>
</xsl:template>它可以简化为例如。
<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
<text type="/text" text="{serialize(*)}"/>
</xsl:template>这样,输出就会更像。
<text type="/text"
text="<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" class="data-class">May 19, 2020
 </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">
 Lorem ipsum dolor sit amet, consectetur adipisicing.
 </p>"/>如果你真的需要走这条路,生成格式不佳的结果,那么在XSLT 3中,字符映射可以帮助你。
<xsl:output version="1.0"
encoding="UTF-8"
indent="yes"
method="xml"
omit-xml-declaration="no" use-character-maps="m1"/>
<xsl:character-map name="m1">
<xsl:output-character character="<" string="<"/>
<xsl:output-character character=">" string=">"/>
<xsl:output-character character=""" string="""/>
</xsl:character-map>然后,撒克逊产生的输出类似于。
<text type="/text"
text='<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" class="data-class">May 19, 2020
 </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">
 Lorem ipsum dolor sit amet, consectetur adipisicing.
 </p>'/>https://stackoverflow.com/questions/73883490
复制相似问题